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We introduce a generalization of the parallel, or Crow-Kimura, and Eigen models of molecular 
evolution to represent the exchange of genetic information between individuals in a population. 
We study the effect of different schemes of genetic recombination on the steady-state mean fitness 
and distribution of individuals in the population, through an analytic field theoretic mapping. We 
investigate both horizontal gene transfer from a population and recombination between pairs of 
individuals. Somewhat surprisingly, these nonlinear generalizations of quasi-species theory to mod- 
ern biology are analytically solvable. For two-parent recombination, we find two selected phases, 
' one of which is spectrally rigid. We present exact analytical formulas for the equilibrium mean 

, fitness of the population, in terms of a maximum principle, which are generally applicable to any 

permutation invariant replication rate function. For smooth fitness landscapes, we show that when 
positive epistatic interactions are present, recombination or horizontal gene transfer introduces a 
^ mild load against selection. Conversely, if the fitness landscape exhibits negative epistasis, horizon- 

. tal gene transfer or recombination introduce an advantage by enhancing selection towards the fittest 

' genotypes. These results prove that the mutational deterministic hypothesis holds for quasi-species 

models. For the discontinuous single sharp peak fitness landscape, we show that horizontal gene 
transfer has no effect on the fitness, while recombination decreases the fitness, for both the parallel 
and the Eigen models. We present numerical and analytical results as well as phase diagrams for 
' the different cases. 
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I. INTRODUCTION 



It has been argued that genetic recombination provides a mechanism to speed up evolution, at least in finite 
] populations [ij. Moreover, it has been suggested that recombination may provide a way to escape from the phenomenon 
^ , of "MuUer's ratchet" 0], or suboptimal fitness characteristic of finite populations with asexual reproduction. In 
ly-j ' bacteria, it has been proposed (3| that horizontal gene transfer allows for the gradual emergence of modularity, 
, through the formation of gene clusters and their eventual organization into operons. In in-vitro systems, protein 
fS| ' engineering protocols by directed evolution incorporate genetic recombination in the form of DNA shuffling [j, Q to 
speed up the search for desired features such as high binding constants among combinatorial libraries of mutants. 
I ■ Besides these inherently dynamical effects, it remains a matter of debate if the exchange of genetic-encoding elements 
\ provides a long-term advantage to an infinite population in a nearly static environment. Indeed, it is argued that Q 
. when advantageous genetic associations have been generated as a result of selection in a given environment, further 
^ ' random recombination is likely to disrupt these associations, thus decreasing the overall fitness. This argument is 
• i-H . less cogent if we consider that recombination and horizontal gene transfer preserve the modular structure of the 
' genetic material Q. That is, entire operational and functional units are recombined, rather than random pieces. It 
^ , has also been proposed that for recombination to introduce an advantage in infinite populations, negative linkage 
■ - - ' disequilibrium is required 0, S [lOt . This situation means that particular allele combinations are present in 
the population at a lower frequency than predicted by chance. Negative linkage disequilibrium can result as a 
consequence of negative epistasis: alleles with negative contributions to the fitness interact synergistically, increasing 
their deleterious effect when combined, and alleles with positive contributions to the fitness interact antagonistically 
0, [m, [3, see Fig. [T] Under negative epistasis, the mutational deterministic hypothesis 0, S [13, [IHj H, [3 
postulates that recombination promotes a more efflcient removal of deleterious mutations, by bringing them together 
into single genomes, and hence facilitating selection [l^, [l^ to discard those genotypes with low fitness. It has been 
argued that the negative linkage disequilibrium generated by negative epistatic interactions is a factor to promote the 
evolution of recombination in nature 0, [Tsl . , and conversely that recombination may act as a mechanism to evolve 
epistasis [3, [l^[2^. This later statement is controversial, since it is intuitive that recombination should contribute 
to weaken correlations between different genes (2]| . Despite these theoretical arguments, experimental studies seem 
to indicate that negative epistasis is not so common in nature [2^ |23| as recombination and, moreover, both negative 
and positive epistasis may coexist as different fitness components [7[ within the same genome in natural organisms. 

To address some of these questions, we study the effect of transferring genetic information between different or- 
ganisms in an infinite population. We choose the conceptual framework of "quasi-species" theory, represented by two 
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classical models of molecular evolution: the Eigen 0, [2^, H^, [13] model and the parallel, or Crow-Kimura, model 
m,!!^. These classical models include the basic processes of mutation, selection, and replication that occur in biolog- 
ical evolution. Our goal is to solve these two standard models of quasi-species theory, Crow-Kimura and Eigen, when 
horizontal gene transfer or recombination are included. Since horizontal gene transfer and recombination are essential 
features of evolutionary biology, our solutions bring quasi-species theory closer to modern biology. An operational 
definition of fitness is provided in these models by the replication rate, which is considered to be a function of the geno- 
type. In their simplest formulation quasi-species models consider a static environment, with a deterministic mapping 
between individual genetic sequences and replication rate. Both the Eigen j24i . i25j and the parallel, or Crow-Kimura 
model [28j . are formulated in terms of a large system of differential equations, describing the time evolution of the 
relative frequencies of the different sequence types in an infinite population, a mathematical language that is common 
in the field of chemical kinetics [l^, H^. Sequences, representing information carrying molecules such as RNA or 
DNA, are assumed to be drawn from a binary alphabet (e.g. purines/pyrimidines). The most remarkable property of 
these classical models is that when the mutation rate is below a critical value they exhibit a phase transition in the 
infinite genome limit [il, [H, US [13, US lU, [H, [H, [s^l , with the emergence of a self-organized phase: the quasi-species 
[13, l25l. l26l|. This organized phase, characterized by a collection of nearly neutral mutants rather than by a single 
homogeneous sequence type, is mainly a consequence of the auto-catalytic character of the evolution dynamics, which 
tends to enrich exponentially the proportion of fittest individuals in the population 0, d^, d^, [13| . The quasi-species 
concept, with its corresponding "error threshold" transition, has been applied in the interpretation of experimental 
studies in RNA viruses ^35,, .36,, ,37l. [38| . In particular, the error-threshold transition has been proposed as a theoretical 
motivation for an antiviral strategy [39| , termed " lethal mutagenesis" , which drives an infecting population of viruses 
towards extinction by enhancing their mutation rate pol. [41114^. It has been argued, however, that the mechanism 
for lethal mutagenesis possesses a strong ecological component [i^, and that perhaps the mean population fitness 
is simply driven negative, and so the total number of viral particles in an infecting population decreases in time 
towards extinction, in contrast with error-threshold theories that describe a randomization of the composition of the 
quasi-species in genotype space. 

The existence of the error threshold transition has motivated the attention of theoretical physicists, especially since 
it was proved that the quasi-species theory can be exactly mapped into an 2D Ising spin system [sol . [3l| , with a phase 
transition that is first order for a sharp peak fitness, and second or higher order for smooth fitness functions. More 
recently, exact mappings into a quantum spin chain [3, HE, EE 113, El] or field theoretic representations [s^ have been 
developed. Analytical and numerical studies of these systems, in the large genome limit, are possible when the fitness 
function is considered to be permutation invariant [3^.l33l. H . [isl , E^ , or depending on the overlap with several peaks 
in sequence space [s^]- The mapping of the quasi-species models into a physical system allows for the application of 
the powerful mathematical techniques of statistical mechanics, thus obtaining exact analytical solutions which provide 
significant insight over numerical studies [H, [13, [IS] ■ Most of the existing analytical solutions correspond to the case 
when recombination is absent. Recombination and horizontal gene transfer have been studied by computer simulations 
of artificial gene networks [ll| and digital organisms [1] , but relatively few analytical approaches have been reported 
in the context of quasi-species theory P, |49| , [sol , [sTj . A numerical study of a mathematical model for viral super- 
infection termed uniform crossover, and intermediate between horizontal gene transfer and recombination, has been 
reported [s^, with numerical solutions based on relatively short viral sequences (N=:15). More recently, the effect of 
incorporating horizontal gene transfer in quasi-species theory has been studied in terms of the dynamics [l[ , reporting 
numerical studies and approximate analytical expressions. Exact analytical expressions for the equilibrium properties 
of the population in the presence of horizontal gene transfer have been derived using the methods of quantum field 
theory |49j . 

In this article, we study the effect of introducing different schemes of genetic recombination in quasi-species theory. 
Extending the results in [IS], we present an exact field theoretical mapping of the parallel and Eigen models. We 
remark that field theoretical methods provide a unique and powerful set of tools for the analytical study of dynamical 
systems, such as reaction-diffusion [12, [11] or birth-death processes [11]. In this paper, we employ these theoretical 
tools to obtain exact analytical expressions for the equilibrium mean fitness and average composition of the population, 
for permutation invariant but otherwise arbitrary replication rate functions. 

In Section 2 we consider the parallel model. We consider horizontal gene transfer of non-overlapping blocks, as well 
as of blocks of random size. We also consider a recombination process producing a daughter sequence symmetrically 
from two parents, as might occur in viral super- or co-infection. In Section 3, we study the effect of these different 
genetic recombination schemes in the context of the Eigen model. In both models, recombination leads to two selected 
phases. Interestingly, beyond a critical recombination rate, the distribution of the population becomes independent 
of the recombination rate. Also interesting is that the steady-state distribution is independent of the crossover 
probability. 

To study the effect of epistasis, whose sign is determined by the curvature of the fitness landscape (second derivative) 
when represented as a function of the Hamming distance with respect to the wild-type, we considered two different 
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FIG. 1: Convention for the sign of epistasis, e. In the figure are represented two smooth fitness landscapes, as a function of 
u = 21 /N — 1, with A'' the total length of the (binary) genetic sequences and < I < N the number of beneficial mutations 
(number of '+' spins) along the sequence. In this representation, positive (synergistic) epistasis e > corresponds to a positive 
curvature / (u) > 0, while negative (antagonistic) epistasis e < corresponds to a negative curvarture / (u) < [tI. [Til [l^. 
The examples shown are a quadratic fitness landscape f{u) — kv? /2 (dashed line), with positive curvature and e > 0, and a 
square-root fitness landscape f{u) = k^/u (solid line), with negative curvature and e < 0. We set k = 4.0 in both examples. 

examples of smooth fitness functions: a quadratic function, representing positive epistasis, and a square-root function 
representing negative epistasis. We find that, for the quadratic fitness function, horizontal gene transfer and recom- 
bination introduce a mild load against selection. The opposite effect is observed for the square-root fitness, that is, 
horizontal gene transfer and recombination introduce an advantage by enhancing selection towards fittest genotypes. 
This results provide support for the mutational deterministic hypothesis, which postulates that recombination should 
be beneficial for negative epistasis fitness functions, and deleterious for positive epistasis fitness functions. Moreover, 
we prove analytically in Appendix [L] that the mutational deterministic hypothesis applies for the parallel model in 
the presence of horizontal gene transfer. A similar proof is provided in Appendix |M] for the Eigen model. We also 
show analytically that the mutational deterministic hypothesis applies for the case of two-parent recombination, as 
presented in Appendix [Nl for the parallel model, and in Appendix [Q] for the Eigen model. 

The effect of recombination becomes negligible for discontinuous fitness landscapes, such as a single sharp peak. For 
all these cases, we present exact analytical expressions that determine the phase structure of the population at steady 
state. Results are explicit for any microscopic fitness function: Eqs. (14), (31), and (62-63) for the parallel model and 
Eqs. (82), (93), and (106-107) for the Eigen model. We evaluate these expressions for three permutation invariant 
fitness functions: sharp peak, quadratic, and square root for the two common forms of quasi-specics theory, parallel 
and Eigen: Eqs. (22), (23), (33), (34), (68), (71), (85-87), (96-98), (112), and (113). We also present numerical tests 
supporting our analytical equations. 
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II. THE PARALLEL MODEL 

We consider a generalization [i^ of the parallel, or Crow-Kimura [1^, model to take into account the transfer of 
genetic material between pairs of individuals in an infinite population. 

dqi ^ ^^ Ekd Rlilkm , , 

-TT = r^qi + > ^iikqu + vN =^ vNq^ (1) 

Here, qi represents the (unnormalized) frequency of the sequence type Si = (s^^, s|, . . . , s^), with s* = ±1, for 

I < i < 2 and I < j < N . The normalized frequencies are obtained from pi = qi / Yl, j=i ■ Eq. Il]), ri is 



the replication rate of sequence Si. It is given that r; = Nf (^-^ The mutation rate from sequence Sj 

into Si is ^ij = fj.6dij,i — Nfiddij,o- The Kronecker delta in this expression ensures that mutations involve a single 
base substitution per unit time (generation). Genetic recombination processes between pairs of sequences in the 
population are represented by the nonlinear term. They are considered to occur with an overall rate v, while the 
coefficient Rl.^ represents the probability that a pair of parental sequences Sk, Si produces an offspring Si. Depending 
on the particular recombination mechanism, some of these coefficients will be identically zero. Also, these coefficients 

must satisfy the condition Xli=i ^ki = 1, V 1 < /c, ^ < 2^ . 

For this generic process, we will present the analytical solutions for the steady-state mean fitness by considering 
different schemes of genetic recombination. 



A. Horizontal gene transfer of non-overlapping blocks 

In this recombination scheme, we consider the exchange of blocks of genetic material between pairs of individuals. 
We consider these blocks to be non-overlapping in the parental sequences, and of a fixed size M. Thus, each sequence 
is made of N/M blocks. The recombination coefficients in the differential Eq. ((!]) are given for this horizontal gene 
transfer process by 

N/M-l Mib+1) \ N /i , „fc„»\ 

^i^= E n i-^) n ^ ■ (2) 

^=0 jt = AIb+l \ / \ / 

Here, < 6 < N/M — 1 represents the block index, while Mh + 1 < jb < M{h + 1) represents the site index within 
block h. 

Generalizing the method presented in [49l |. we write the non- linear term as 

iS-S'(j:(^))i,(^) 

Here, {Ai) = qiAi/ ^^^^ g,„ is a population average. At steady state, this average is independent of the value of 6, 
due to the symmetry of the fitness function. 

The variance of the composition = SjLi given by "Y^^ j'=i{^^\^^\')- absence of recombination 

or horizontal gene transfer this variance is 0{N~^), which implies correlations along the sequence are 0{N~^) [s^. 
We expect the same scaling of the variance in the presence of recombination or horizontal gene transfer. Therefore, 
we introduce the factorization 

I -Q i±^\ ^ n l-^^)+0{M/N) 

\jb=Mb+l I jf,=Mb+l \ / 

„. .+1 

3b ' 



(4) 

which becomes exact in the N ^ oo limit. Here, u{ji,) ~ Yi li^^j^/ 9™ the average base composition at site j^. 
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FIG. 2: Pictorial representation of the horizontal gene transfer process considered. 



We are interested in the long time behavior of the system, when the average base composition becomes independent 
of time and position u{j) ~ u. Thus, in the formalism of spin Boson operators [i^ a(j) = (ai(j), a2(j)), we define 
the recombination operator describing this recombination term by 



N/M-l 



N ^ 



5=0 



' M(6+l) 

n + P-4(jfe)][aiU) + a2(j&)] - i 

jt, = M6+l 



(5) 



Here, / is the identity operator. The coefficients (0± = (1 ± it)/2 represent [i^ the steady-state probability (per site) 
of having a or a Defining the matrix 



D 



P+ P+ 
P- P- 



the recombination operator in Eq. ([5|) can be expressed as 



6=0 



' A/(6+l) 
jt=Mb+l 



(6) 



(7) 



1. The Hamiltonian 



Considering the recombination operator in Eq. we formulate the Hamiltonian describing the system 



H = Nf 



1 ^ ^ 



N 



N/M-1 



6=0 



M(6+l) 
jt, = M6+l 



(8) 
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Here, '^s = ^ q ^ij '^^ ^ 1 y Pauli matrices. Wc introduce a Trotter factorization 

e-** = Jirn^y [Pz-P^l \zm) (^J[{zk\e-'"\zk-i)^ (zo\. 
As shown in Appendix |^ the partition function that gives the mean population fitness is 

Here, the action in the continuous time hmit is 



(9) 



(10) 



-N / dt 
Jo 



u - + /(O + — -A^lnQ. 



(11) 



2. The saddle point limit 



In the N oo hmit, the saddle point is exact and we obtain an analytical expression for the partition function 
Eq. (fTU]) . We look for the steady-state solution, when the fields become independent of time, ^c, 4'c, 4>c- The trace 
defined by Eq. (jAlOp in the long time saddle-point limit becomes 



lim 

t—>oo t 



= Y + [UL + "4) + (a* + 4/2)2] 



21 1/2 



(12) 



Hence, the saddle-point action is 

InZ 

N,t^oo Nt 



fn 



= lim 



t^oo Nt 



max |/(^c) - CcCc - ^c-Z-c - M - ^ + -^(Pc^ 



+ y + [U^c + u4>,) + + 0c/2)' 



,1/2 



(13) 



As shown in Appendix |Bl the mean fitness of the population is 



max^ J /(^e) - - ]g + ]^['^c(^c)]'' + Mv/T^T 



Vr^(l + |^[0,(f,)]^'^-i) 



(l + ^(l-«2)[0^(^J]M-l 



1/2 



(14) 

Here, 4>c is given by Eq. jBTj, and the surplus u is obtained through the self-consistency condition ~ /(«)• 
Equation (jl4[) represents an exact analytical expression for the mean fitness of an infinite population experiencing 
horizontal gene transfer. This expression is valid for an arbitrary, permutation invariant replication rate f{u). 

It is worth to notice that Eq. is a natural generalization of the single-site horizontal gene transfer process 
described in [i^. Indeed, specializing the Eqs. (|B7[) and to the particular case M = 1, after some algebra, we 
obtain 



-1<C<:<1 

which reproduces the analytical result in [4 



/,„(M = 1) = max^^ { fi^,) - - ^ + + yr^^c- 



,t2 



^'^2) -It 



-,1/2 



(15) 



3. Numerical tests and examples 



For numerical calculations, it is convenient to reformulate Eq. ([T]) in terms of the fraction of the population at a 
distance / from the wild type. Pi = "^j^CiPr Here, Ci is the class of sequences with I number of "-1" sites. The 

number of sequences within this class is ( ^ 
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As an example, for the case M = 3, the differential equation representing the time evolution of the probability 
distribution of classes within an infinite population of binary sequences is 



dt 



= N 



N 



f{2l/N-l)-J2Pi'fi'2l'/N-l)-, 

l'=Q 



Pi + t^N 



N-l + 1 
N 



P 



i-i 



l + l 
N 



Pi 



i+i 



+ {plg^[N - I + 3)Pi^3 + [pth{N -I + 2) + iplp+gz[N - I + 2)]P,_2 

+ [plh{l - 1) + ip-p\g3{N + + 3pip+h{N - I + l)]Pi-i 

+ [plh{N -l-l) + iplp+g^il + 1) + ip-p\h{l + l)]P,+i 

+ [plh{l + 2) + 'ip-p\g^{l + 2)]Pi+2 + plg:i{l + 'm+s] 

-^^ {(P- + iP-P+ + ip-p\)g^{N + {pi + iplp+ + p\)h{N - I) 

+ (pi + 'ip-pl + pl)h{l) + {p\ + 'ip+pl + 3p_p^).93(0} Pi 



(16) 



In writing this equation we have made use of the only 0{N ^) correlations between sites, which holds at long time 
as well as for short time with suitable initial conditions. Here, we defined 



1±M 



P± 



(17) 



where the average composition is calculated as 



N 



1=0 



N 



(18) 



and the functions 



53(0 

h{l) 



= 3 



l{l - - 2) 
N{N - 1){N ~2) 
1{1-1){N -I) 



N{N - 1){N -2) 



(19) 



A comparison between the analytical expression Eq. p4p and the direct numerical solution of the differential Eq. 
for N — 1002 is presented in Tabled where the quadratic fitness f{u) = kv? /2 was considered. We notice that 
the analytical method and the numerical solution provide the same results within 0{N^^), as expected from the 
saddle point limit. 

The differential equation representing the horizontal gene transfer of blocks of size M ~ 4 within an infinite 
population of binary sequences is given by 



d 



= N 



N 



J{2l/N-l)-Y,P,f{2l'/N-l)-_ 



l'=0 



Pi+pN 



N-l + 1 
N 



P 



l + l 



/-I 



P. 



l+l 



N 

+ ^N {giiN - I + A)ptPi-i + [pth^{N - / + 3) + Ap''_p+gi{N - I + 3)]P,_3 
+ [pth2{l -2)+ iptp+h3{N -1 + 2) 

+ Qplplgi{N - I + 2)]P,_2 + [pth^{l - 1) + p+/^2(/ _ l) 
+ &plplh^{N -l + l) + Ap^p\gi{N - I + l)]P,_i 

+ [pXhz{N -l-l)+ Ap-p\h2{l + 1) + 6plplh3{l + 1) + iplp+gAil + l)]Pi+i 
+ [p^h^il + 2) + Ap^plh^il + 2) + &plplgS + 2)]Pi+2 
+ [pXh^il + 3) + Ap^p\gi{l + 3)]Pi+3 + pXgi{l + 4)P,+4} 

- JtV {[pt + ^plp\ + ^plp+ + pX]h^{N -l) + [pt + &plp\ + Ap^p\ + pX]h^{l) 
+ [Aplp+ + &plpl + Ap^pl + pt]gi{N -l) + [4plp+ + 6plpl + Ap^pl + pX^l) 



[pi 



+ Ap^pX + pX]h2{l)}Pi 



(20) 
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TABLE I: Analytical versus numerical results for horizontal gene transfer in the parallel (Kimura) model for the quadratic 
fitness f{u) = with M — 3. 



k/fi 




numeric 

U 


analytic 

U 


z.U 


U.U 




A C AAA 


Z.U 


U.O 




A AOOO 


n 
z.U 


l.U 


U.4DUO 


U.4D ( 1 


z.U 


1.0 


U.401U 


U.4oly 


O K 
Z.O 


A A 
U.U 


A K AAK 


A CAAA 

U.oUUU 


Z.O 


A CC 
U.O 


A c^Ai p: 

U.oyio 


A tXAOA 


2.5 


1.0 


0.5838 


0.5844 


2.5 


1.5 


0.5766 


0.5772 


5.0 


0.0 


0.7998 


0.8000 


5.0 


0.5 


0.7988 


0.7990 


5.0 


1.0 


0.7979 


0.7981 


5.0 


1.5 


0.7970 


0.7972 



TABLE II: Analytical versus numerical results for horizontal gene transfer in the parallel model for the quadratic fitness 
f{u) = with M = 4. 



k/fi 


u/fi 


numeric 

U 


analytic 

U 


2.0 


0.0 


0.4993 


0.5000 


2.0 


0.5 


0.4832 


0.4840 


2.0 


1.0 


0.4672 


0.4680 


2.0 


1.5 


0.4510 


0.4519 


2.5 


0.0 


0.5995 


0.6000 


2.5 


0.5 


0.5916 


0.5921 


2.5 


1.0 


0.5839 


0.5845 


2.5 


1.5 


0.5766 


0.5773 


5.0 


0.0 


0.7998 


0.8000 


5.0 


0.5 


0.7988 


0.7990 


5.0 


1.0 


0.7979 


0.7981 


5.0 


1.5 


0.7970 


0.7973 



Here, the parameters p± and u are defined, as before, by Eq. p7p and Eq. ([T8|) . respectively. We also define the 
functions 



54(0 

hail) 



Ijl - 1){1 - 2){l - 3) 
N{N- l)(iV-2)(iV-3) 

N{N- l)(iV-2)(iV-3) 



^' N{N ~1){N -2){N -3) ^ ' 

A comparison between the analytical expression Eq. (|14p and the direct numerical solution of the differential Eq. 
(pO)) for N = 1002 is presented in Table HTl for the quadratic fitness f{u) = fcu^/2. As in the former case, the numerical 
and analytical results agree to within 0(7V~^), as expected. 

For the quadratic fitness case in the absence of recombination [v = 0), the exact analytical result predicts the 
existence of a "selected" organized phase, or quasi-species, when k > fj.. In this phase, the average composition is 
given by M = 1 — ^/k. For k < fj., a phase transition occurs and the quasi-species disappears in favor of a disordered or 
"unselected" phase with u = 0. In Figure [31 we display the phase structure in the presence of horizontal gene transfer. 
In agreement with the numerical results presented in Table U and Table |lTl the recombination scheme considered in 
this model introduces a mild mutational load. However, near the critical region k/ fi ^ 1, one observes that horizontal 
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0.8 
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1.6 



FIG. 3: Phase diagram of the parallel (Kimura) model for the quadratic fitness f{u) = ku'^ /2, with horizontal gene transfer of 
non-overlapping blocks of size AI. The phase boundary of the error threshold phase transition is given by the curve, and its 
shape is independent of the block size M . In the absence of horizontal gene transfer, the phase transition occurs at fc//i = 1. 



gene transfer distorts the phase boundary which defines the error threshold, from the horizontal line k/fi — 1, to 
a monotonically increasing curve that saturates for large values of v/fJ-. We obtain an analytical expression for the 
phase boundary, by expanding Eqs. (|B7P and ([T4|) near the critical region ~ 0, w ~ 0. We find that the boundary 
is defined by 

fccHt-M^i-^. (22) 

We notice from this expression that for small v, fccrit ^ fJ. + i^/2, whereas for large ly the phase boundary becomes 
asymptotically independent of fccrit 2/i. We also notice from this formula that the phase boundary is independent 
of the block size M. 

As a second example, we consider a square-root fitness function 



fiu) = k^\u\ (23) 



In Table Hill we present a comparison of our analytical result, obtained from Eq. (jl4[) . with the direct numerical 
solution of the differential Eq. (|16|) . for M ~ 3. As in the quadratic fitness example, the analytical and numerical 
results agree to order 0{N~^), as expected. 

From the results presented in Table IIIIl it is remarkable that the average composition u, and correspondingly the 
mean fitness of the population /,„ ~ fcy^juj, increase when increasing the horizontal gene transfer rate v. 

The mutational deterministic hypothesis states that recombination is beneficial for negative epistasis fitness func- 
tions (see Fig.[T]) / (u) < 0, and deleterious for positive epistasis fitness functions, / (u) > 0j B [13, UHj Elj H ■ Our 
results for the quadratic and square-root fitness functions, Eqs. p4)) - ((22l) and Tables HI |TT1 and IIIIl provide support 
for this hypothesis. In fact, we can prove the mutational deterministic hypothesis holds for the parallel model in the 
presence of horizontal gene transfer. Appendix IlI 

Horizontal gene transfer has less of an effect for the sharp peak fitness, f{u) = AS^^i. For general M, the maximum 
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TABLE III: Analytical versus numerical results for horizontal gene transfer in the parallel (Kimura) model for the square-root 
fitness f{u) = fcvlul, with M = 3, N = 801. 



k/fi 






^analytic 


2.0 


0.0 


0.4858 


0.4855 


2.0 


0.5 


0.4892 


0.4889 


2.0 


1.0 


0.4918 


0.4915 


2.0 


1.5 


0.4939 


0.4936 


2.5 


0.0 


0.5399 


0.5396 


2.5 


0.5 


0.5428 


0.5425 


2.5 


1.0 


0.5450 


0.5448 


2.5 


1.5 


0.5469 


0.5466 


4.0 


0.0 


0.6525 


0.6523 


4.0 


0.5 


0.6542 


0.6540 


4.0 


1.0 


0.6556 


0.6554 


4.0 


1.5 


0.6568 


0.6565 



in Eq. (fTl]) is achieved for = 1, with 0c(l) = (1 + u)/2 from Eq. (jB7p . Thus, one obtains 



1 - 



1 + u 



M 



(24) 



The error threshold is given for m = by the condition A > ji + -^(1 — 2 ^^). However, we notice from Eq. 
that fm{u = \) = A — ^ > fm{u ~ 0). Therefore, we have u = 1 — 0{N~^) in the selected phase, with the effect of 
horizontal gene transfer being negligible for finite M. We obtain the fraction of the population located at the peak 
Pq, from the self-consistency condition PqA — /„, which yields Pq = 1 — fi/A. Thus, the true error threshold is at 
^crit = with the condition A > n + — 2^^^) defining the limit of metastability for initial conditions with 
u ^ 0. These results are similar to the ones obtained in the absence of horizontal gene transfer [H, H^, [sHi . Thus, 
we conclude that for the sharp peak fitness, horizontal gene transfer does not spread out the population in sequence 
space. This result differs from the numerical studies presented in [50| . where a mathematical model for 'uniform 
crossover' recombination between viral strains super-infecting a population of cells was described. We remark that 
this model studied sequences of finite length (N — 15), where the error threshold transition is not really sharp. Our 
results correspond to the more realistic limit N —> oo (typical viral genomes are 10'^ — 10^). 

In summary, from our exact analytical formula for the mean fitness Eq. (|14p . which is valid for any permutation 
invariant replication rate, we developed the explicit solution of three different examples: a quadratic fitness, a square- 
root fitness and a single sharp peak. For the case of smooth fitness functions, from our exact analytical formulas for 
the mean fitness fm and average composition u, we conclude that in agreement with the mutational deterministic 
hypothesis 0, H, [lOj El, 13 , a population whose fitness represents positive epistasis (i.e. quadratic), will experience an 
additional load against selection due to horizontal gene transfer. On the contrary, when negative epistasis is present 
(e.g. square- root), horizontal gene transfer is beneficial by enhancing selection. We provided a mathematical proof for 
this effect, Appendix [Ll When the fitness is defined by a single sharp peak, the population steady-state distribution 
behaves more rigidly in response to horizontal gene transfer. This fundamental difference can be attributed to the 
structure of the quasi-species distribution, which in the smooth fitness case is a Gaussian centered at the mean fitness, 
while in the sharp peak it is a fast decaying exponential, sharply peaked at the master sequence [sst . While the 
Gaussian distribution spreads its tails over a wide region of sequence space, thus allowing for horizontal gene transfer 
effects to propagate over a large diversity of mutants, the sharp exponential distribution concentrates in a narrow 
neighborhood of the master sequence, acting as a barrier to the propagation of such effects. 



B. Horizontal gene transfer for multiple-size blocks 



A natural extension to the model of horizontal gene transfer involving blocks of genes of a given size is to consider 
a process where each site along the sequence may be transferred with probability 7, or left intact with probability 
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1—7- The operator describing this process is 



R 



1 ^ 



(M)- 



(25) 



Here, Rj = {j)Da{j) is the single-site recombination operator defined in Eq. ([51), with the matrix D defined as in 
Eq. Notice that this operator represents a binomial process, where an average number of sites (M) = is 
transferred. If we consider, as in the former finite block size case, that N/{M) = 0{N), then we have 7 = {M)/N, 
and for very large N Eq. (|25|) reduces to 



1 ^ 

\ ' .7 = 1 



(A/) (A/) 



(Af)- 



(26) 



Considering the recombination operator defined in Eq. (|26p . the spin Boson Hamiltonian for the Kimura model 
becomes 



H = Nf 



N 



(M) 



NI. 



N 



j=i \ ' 



We introduce a Trotter factorization 

e-"' - lim / [V^Vz\ \zm) ( TT(zl|e 
As shown in Appendix [Cl the partition function becomes 



M 



Here, the action in the continuous time limit is 



-N / dt' 
Jo 



(M) 



(M) 



- iVlng 



(27) 



(28) 



(29) 



(30) 



1. The saddle point limit 



As in the previous model, the saddle point limit is exact as iV ^ 00 in Eq. (|30p . 

After a similar procedure as in section II. A. 2, we find the saddle-point equation for the mean fitness 



1/ V 



(A/) (A/) 



1/2 



Here, 0c (^c) is obtained from the equation 



1 - 



l+^(l-n2)e-(M>(i-*c) 



1/2 



(31) 



(32) 
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TABLE IV: Analytical results for horizontal gene transfer in the parallel model for the quadratic fitness f{u) = with 
{M) = 3. 



k 


1/ 


anal vt. i c 


2.0 


0.0 


0.50 


2.0 


0.5 


0.4840 


2.0 


1.0 


0.4680 


2.0 


1.5 


0.4522 


2.5 


0.0 


0.6000 


2.5 


0.5 


0.5921 


2.5 


1.0 


0.5845 


2.5 


1.5 


0.5773 


4.0 


0.0 


0.8000 


4.0 


0.5 


0.7990 


4.0 


1.0 


0.7981 


4.0 


1.5 


0.7973 



TABLE V: Analytical results for horizontal gene transfer in the parallel model for the square-root fitness f{u) = k^/\u\, with 
(M) = 3. 



k 


1/ 


^analytic 


2.0 


0.0 


0.4855 


2.0 


0.5 


0.4889 


2.0 


1.0 


0.4915 


2.0 


1.5 


0.4936 


2.5 


0.0 


0.5396 


2.5 


0.5 


0.5425 


2.5 


1.0 


0.5448 


2.5 


1.5 


0.5466 


5.0 


0.0 


0.6523 


5.0 


0.5 


0.6540 


5.0 


1.0 


0.6554 


5.0 


1.5 


0.6566 



Eq. (|3ip represents an exact analytical expression for the mean fitness fm of an infinite population experiencing 
horizontal gene transfer of multiple size sequences. The formula is valid for an arbitrary, permutation invariant 
replication rate function f{u). 

We notice that recombination introduces an additional mutational load against selection. This load is mild at low 
values of the fitness constant fc, and becomes negligibly small at larger values. Numerical evaluation of Eqs. pi]) and 
([5^ is presented in Table HVl for the quadratic fitness f{u) = ku^/2, and average block size (M) ~ 3. 

An analytical expression for the phase boundary is obtained from Eqs. (|3ip and (j32[) . near the error threshold u ~ 0, 
0. We find 

1 + ^ 

kcit = M (33) 

We notice that for small v, the critical value is fccrit ~ A* + i^/2, whereas for large values of it becomes independent 
of recombination fccrit ~ 2/x. This behavior is similar to the one previously observed in Fig. [3] for the case of horizontal 
gene transfer with blocks of fixed size. The shape of the phase boundary is independent of the block size in the 
horizontal gene transfer process, assuming that the size of the blocks is finite. 

As a second example, we consider the square root fitness f{u) = k^/\u\. Analytical results for the average com- 
position, obtained after Eq. HH), are represented in Table IVl for blocks of average size (M) = 3. From the values 
displayed in Table |Vl we notice that horizontal gene transfer introduces a mild increase in the average composition 
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and; correspondingly, in the mean fitness of the population fm = fc\/|uj. This trend, which is opposite to the quadratic 
fitness case, can be attributed to the negative epistasis represented by the square root fitness, by similar arguments 
as in the case of fixed block size. 

Horizontal gene transfer does not affect the phase boundary for the sharp peak fitness, f{u) = ASu^i- In this case, 
Eq. ((3T|) is maximized at = 1, with 0c = (1 + w)/2 from Eq. ((32)l . Thus, the mean fitness becomes 

U = A-^,-^[l-e-(^'^('--y'] (34) 

The error threshold is given, for u = in Eq. (|34[1 . by the condition A > ^ + "^^[1 ^ e^^^^^/^]. However, we notice 

that fm{u = 1) = A — /i > fm{u = 0). Hence, in the selected phase u = 1 — 0{N^^), and the recombination 
effect becomes negligible for infinite N. From the self-consistency condition /„ ~ PqA, we obtain the fraction of 
the population located at the peak Pq = 1 — ^/A. Therefore, the true error threshold is given by Acrit > with 
A > /i + -^^[1 ^ e^^^^^/^] the limit of metastability for initial conditions with u ~ 0. 

Therefore, we conclude that horizontal gene transfer for multiple size blocks displays a qualitatively similar behavior 
to the corresponding process for fixed block size. A population evolving under a smooth fitness function with positive 
epistasis (e.g. quadratic, see Fig. [T]) experiences an additional mutational load due to horizontal gene transfer, which 
modifies the quasi-species structure, reducing the mean fitness, and hence shifting the error threshold. On the contrary, 
when epistasis is negative (e.g. square-root, see Fig. [1} a beneficial effect is induced by horizontal gene transfer, in 
agreement with the mutational deterministic hypothesis, as we demonstrate in Appendix iLl 

A discontinuous sharp peak fitness function does not change the quasi-species distribution or the mean fitness, 
although it does introduce metastability. 



C. The parallel model with two-parent recombination 



Biological recombination, as occurs for example in viral super- or co-infection or in sexual reproduction, involves 
the crossing over of parental strands at random points along the sequence. The copying process is carried out by 
the action of polymerase enzymes, which move alternatively along one or the other parental strand. An approximate 
representation of this process is to consider that the polymerase enzyme starts, with probability 1/2 on either parental 
strand, copying one base at a time. Wc consider the crossovers to occur because there exists a probability per site 
that the polymerase "jumps" from its current position towards the other parental strand. Alternatively, the enzyme 
progresses along the current strand with probability 1 — Pc- A pictorial representation is shown in Fig.|4l 

For this particular process representing the wandering path followed by the polymerase enzyme, the recombination 
coefficients R\.i in Eq. ([T]) are given by the exact analytical expression 



{a,=±l} 



1 ^ f^ + SisW fl + s[s\ 
2 



l + °2 

„i \ 2 / 1 I 



X[(l-Pc) ^ P, 

1+ .. ^ .. 
X ... X [(1 -pc) 5 



1 + ajV 



(35) 



Here, the recombining parental sequences are Sk = (s^, Sj, . . . , s|f). Si = {s{, . . . , sjy) and the offspring sequence is 
Si = {s\, s\, . . . , s]y), with Sj = ±1. Using Eq. (|35|) . Eq. ([T]) representing the time evolution of an infinite population 
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1-Pc 



1-Pc 



1-p= 



1/2 



du 



o 



1 1 1 1 



I Parent 



1/2 » 



' rn-m 



Parent 2 



I 



rn-m n«.n 



ring 



FIG. 4: Pictorial representation of the two-parent genetic recombination process considered in the theory. 



of binary sequences experiencing replication, point mutations and two-parent recombination, exactly becomes 

2" 2' 



dt 



k=l 



k=l {qj=±1} 



N 1 

Up-. 

J=2 



1=1 



1-4 

2 + 



qk - vNqi 



(36) 



where, again, pi = qij X]/=i 9' ^^e normalized probability for sequence 1 < Z < 2 . 

From Eq. p6p . the recombination operator corresponding to this recombination process in the spin Boson repre- 
sentation is 



1=1 {q, = ±1} 

x...x[(l-p,) ^ ^ ]x[/^^ Ri{N)^]-I 

^giiRiij)})-! 

Here, the local recombination operator is Ri{j) — a{j)'' D'ja{j) , with 

/ l+s' 1 + s' \ 



D ■ = 2 2 



(37) 



(38) 
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The Ij are the identity operators acting on site \ < j < N , whereas / = Hj^^i ^^e identity operator for the entire 
sequence vector. 



1. The Hamiltonian 



The Hamihonian describing the evolution of this system in the spin Boson representation is given by 



H^Nf 



N 



3 = 1 



We introduce a Trotter factorization 

e-^* = Km / [Dz'Vzl \zm) ( T\(^k\e- 
As shown in Appendix |D] the partition function is 



N 

E P^O>i^(j) ~i]+'^N [g[{Ri{j)}] - l) 



M 



Zk^i) {zo\ 



Here, the action in the continuous time hmit is given by 



N / dt f-ft 



(39) 



(40) 



(41) 



(42) 



As shown in Appendix [El the recombination term can be represented, for < Pc < 1/2, by the exact finite series 



N 



9 



N 



TV 



i<i<j 

N 

l<z<j </c<n 



TV 

X n 

'm^i,j.k.7i 



+„_fe i-V'ji-^U^v4.i~V'i 

2 2 2 2 

(i-2pe)^-jn(^^ 



(43) 



were we used the notation ■0^ = zl,(j)DjZk-i{j), and Dj is defined in Eq. ((38|) . 
We consider first the case when = 1/2 in the above expression. Then, we have 

2" N 

5({^j-},p. = l/2) = ^p,n(l + V^l)/2 
1=1 ]=1 



(44) 



We notice that the recombination term in the differential Eq. ([55)1 satisfies J2'i=iPi^ki — ^ because R\i > 
and X]i=i -^I-i = 1- o^i^' fi'^l'^ theoretic representation of the model, this condition is equivalent to ^({V'j}) 1^ 1 for 



any physical state. We also have, for example, yJlj^i ^ 9 
term perturbatively, as in Appendix [Aj we obtain terms such as 



N ( 1+V'', 



< 1. If we consider evaluating the g interaction 



{g) = [[l + m/2f + [{l + mi2f-' 



(45) 
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where (^) = J2iPiO^/-^)J2ji''Pj)z- Since both the correlations of the spins in the matrix for typical, likely I 
and the correlations in the z fields are each 0{1/N), the interaction term g in Eq. ((44|) contributes nothing, unless 
(i/O = 1 - ©(l/Af), in which case (5) = O(iVO). 

For the general case of < Pc < 1/2, we notice that < 1 — 2pc < 1. Making the ansatz that correlations between 
z fields and correlations between spins of typical, likely sequences / each remain 0{l/N) at different sites, terms other 
than the first in Eq. (|33]) are at least 0{1/N) smaller when (V^) = 1 - 0{1/N). Thus, when (V') ~ 1, the first term 
dominates the series, and the others become arbitrarily small, thus recovering the same expression as for pc = 1/2. 
On the other hand, when (ip) ~ —1, we notice that the dominant terms are the last ones. However, those terms 
are proportional to powers of 1 — 2pc of order N, whereas the number of these terms is of just polynomial order in 
N. Therefore, for N very large these terms become arbitrary small. Thus, we conclude that in the limit N 00, 
regardless of the value of Pc, the function g is represented by Eq. ([33]). 

In the particular case of uniform crossover pc = 1/2, and when the fitness function is permutation invariant, i.e., 
it depends only on the average composition of the sequence through the average base composition m, it is possible 
to reformulate the differential equation Eq. ([T]) for the evolutionary dynamics of an infinite population of binary 
sequences in terms of the distribution of classes: 

Pi^Y.P^ (46) 
iec, 

where Ci represents the class of sequences with I, "—1" spins. Although all the sequences in a given class do not have 
the same dynamics, we can nonetheless calculate the class dynamics exactly: 



dt 



N 



N 



f{2l/N-l)-Y,Pi>f{2l'/N-l) 



l'=0 



Pi+fi{N -1 + l)Pi^i + + l)Pi+i - NuPi 



+vNY,R{l\hM)PhPi.-NvPi. (47) 

The coefficients i?(/|/i,Z2) represent the probability that a pair of parental sequences in the classes Cjj, Ci^, due to 
uniform crossover recombination, generate a child sequence in the class Ci. The number of sequences in these classes 

. ( n\ ( n\ ( n\ 

is I j ' I ^ j i ^ / ' '''^^P'^*^^^^^^-^' ^ given pair of parental sequences, let us consider the variables 

n_| , n |_ and n , representing the number of pairs of (+1, +1), (+1, ^1), (^Ij +1) a-nd (—1, —1) spins respectively. 

These variables satisfy the equation N = n++ + + n |- + n We further notice that these variables also satisfy 

n [. = li — n and = h — n Considering that from each pair of (+1, —1) or (—1, +1) spins in the parental 

sequences, the child sequence will inherit a "-1" spin with probability 1/2, while from a pair of the kind (—1,-1) it 
will inherit a "-1" spin with probability 1, we have the explicit analytical expression for these coefficients 



mill {ii+i2-;. (1.(2} ( N 
n=max{0,;i+;2-Af} U J U 



(48) 



The first factor is the probability for a configuration with n = n , given li, I2 and /. The second factor is the number 

of ways of picking / — ri__ "-1" spins among + n |_. The third factor is just (1/2)"^+ (1/2)"+^ (1)"-- . These 

coefficients are different from zero only if 

max{0,;i +/2 - N} <l< miniNJi+h} (49) 
They also satisfy the following properties: 

R{l\h,l2)^R{l\h,h) (50) 

N 

Y^miuh) = f V/i,Z2 (51) 

1=0 



R{N\N,N) = i?(0|0,0) = 1 



(52) 



17 



In the limit of large A'', we find that the recombination coefficients satisfy a Gaussian distribution in the variables 
Ml = 1 - 2I1/N, U2 = 1 - 2I2/N, and w = 1 - 21/N (see AppendixE]): 



it,, 



-N[(ui+U2)/2-u]'^ /(l-ul) 
^71(1 - ul)/N 



(53) 



where = f{u^). 

This form of the recombination operator, Eq. (|53p . is equivalent to Eq. (|44p with replaced by u in the D matrix. 
Alternatively, we notice that when the singular behavior of the function g can be described as a delta function, we 
have 



1=1 
2 



1=1 -^n 



E 



Pi 



N 



1 /^!^ 

2! ViV 



— ( 

27r 

d\ 

1 N 

E ^k{j)D]zk-i{j)zl{m)D^^Zk-i{m) 

j,?n — 1 



(54) 



By noticing that correlations between compositions at different sites along the sequence are of order 0{N ^), we have 
that for the second order correlation 



{D]Dl) 



I \2 



(55) 



where {D,) = J2i=iPi^j = ^^'^ population average. A similar analysis for the higher order correlations allows 



(56) 



us to factorize order by order the terms in the series Eq. ((54)) . to obtain 



We are interested in the long term, steady state distribution, when the average base composition u{j) = (sj) ~ u 
becomes independent of time. In this limit, the trace defined by Eq. (jD9p becomes 



lim 



InQc 



t— »oo t 2 



1/2 



Hence, from Eq. ([42]) . the saddle point action is 



lim 



InZ 



lim 



-Sc 



N.t^oo Nt t^oo Nt 

= max 

Sc,ia,<t>c,<Pa 



66 - 4>c4>c - I-J-- ly + vg{4>c) 



/(6) 



As shown in Appendix [Gl we find 

Nt 



S,c{^c + U(t)c) + /i + 



max <j /(6) - fi-v + vg{(t>c) + ^ ^ i2(t>c - 1 - "6) 
— 1 — 



1/2 



1- 



[(20,-l-<,)2-(l-u2)(l-6')] 



1/2 



(57) 



(58) 



(59) 



Because of the singular behavior of the function g{(l)c), to find the saddle point we need to consider three separate 
cases: (j)c < 1, (/^c = 1, and 0c = 1 — 0{1/N). The existence of different expressions for the mean fitness suggests the 
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possibility of different selected phases in certain conditions. We also notice that the saddle point analysis may not 
apply exactly, unless 5 (0c) = (^^ci- 

Case 1: (j)c < 1. For this case, we look for a saddle point in the field (pc in the interior of the domain, 0c < 1 where 
5(0c) = 



Nt J l-"'[(20c-l-<c)2-(l-u2)(l-^2)]i/2 
From Eq. (j60p . we solve for (j)c as a function of fc 



Mic)^—j^ + ^VT^. (61) 



Substituting Eq. ((6T|) in the saddle-point action Eq. ((59l) . we obtain 

/£) = _ max^^ {/(Cc) -f^->^ + mVw!} (62) 
Case 2: 0c = 1. The mean fitness is obtained from Eq. (j59p as 



/(?)^ nmx {/(Cc)-A* + 7-^(l-<c-|<c-^^'|)} (63) 

Case 3: 0c = 1 — 0{1/N). In this case, additional analysis is necessary to calculate the mean fitness due to 
the singular behavior of the g{4>c) function. For a smooth fitness function, we can argue this case does not exist. 
We first consider the Hamiltonian ((5^ for the case g = Q. The largest eigenvalue, /„, is shifted by —v relative 
to the V = Q case. This allows us to calculate the average composition, u*, from the implicit relation fm{v) = 
fmi^ = 0) — V = f{u^,). Alternatively, if we consider the differential equation for the unnormalized class probabilities, 
dQ/dt = LQ, we see that the differential operator L looks like that in the absence of recombination, save for a 
shift of —1/ in the fitness function. Thus, the variance of the population is given by [11] a'^/N ~ 2^,u^J[N f {u^)]. 
Considering more carefully the g function, we find J duidu2i?"j„2P(Mi)P(M2) = exp[— A^(u — w,)^/(2<T^)]/V27rCT^]V, 
with /2 + (1 — u1)/2. This term is exponentially negligible compared to the —vP{u) term when cr^ < cr^, 

since P{u) = cky>[—N{u — u*)^/(2o-2)]/y^27R7j]V. In other words, we must strictly be in case 1 when 

l-it^ < 2^u,//'(m,). (64) 



We denote the value of v at which 



1 — = 2^u,//'(u,) at V ^ (65) 



as Vf. Now, at this value of i'* we have J duidu2R!^-^u^P{ui)P{u2) = Pi^)- Thus, the term proportional to v in 
Hamiltonian ([55]) . or differential equation ([T71) . exactly vanishes. Thus, we have dfm/dv ~ and dP{u)/dv = at 
this value of v. There is spectral rigidity. This implies that for 1/ > i/,, the distribution P{u) is independent of and 
that the value of is constant. In other words, the value of fm in case 2 must be constant with v. Assuming /„ 
varies continuously with v in case 1, and that the fitness values for case 1 and case 2 are equal at a single value of 
therefore, case 2 is simply case 1 with the value v = u^, 

frn{v > V*) = fni{v = U*) (66) 

Eqs. (|62p . (|63p provide an exact analytical solution for the mean fitness of an infinite population, for a general 
permutation invariant replication rate represented by a continuous, smooth function f{u). 

For a non-smooth fitness function, additional analysis is necessary, since f'(u^) is undefined, and P(u) may no 
longer be Gaussian. 



2. Examples and numerical tests 



We investigate the phase diagrams, as predicted from our theoretical equations Eqs. ([62ll . ([63|) for three different 
fitness functions: A sharp peak, a quadratic fitness landscape and a square-root fitness landscape. 
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For the sharp peak landscape f{u) = A5u,i, we notice that the maximum is achieved at = 1, with u = 1 — 0{N ^) 
From Eqs. ([55)1 and ((^ . we obtain 



(67) 



Therefore, for the sharp peak only a single selected phase is observed. In this case, the function g{(j)c) is not exactly 
a Kronecker delta ^^^.^i, we are in case 3, and thus we find a small correction, approximately linear in v, to the 
saddle-point prediction. In the selected phase, where the population is exponentially localized near it = 1 for large N, 
Eq. (^5)) becomes Z2) ~ ('1 + /2)!2^'^^'^/[/!(/i + h ~ O']- By analyzing the differential equation at zeroth-order 

in v for large N, we find that the class distribution is given by P/°'' = Plf\l 
order in i/, the fraction of the population Pq located at the peak is given by 



P^°'>y. Hence, we find that at first 



Po = 1 - ^i/A - vjA 



1-4- 



A 



(68) 



We note that this value of /m = APi^ interpolates between /m"* for Aj [i = 1 and /m'' for Aj \i = 00. There is no 
dependence on pc because the -1 spins are separated by 0{N) sites. 

0.76, , < ' 



f(2) 



0.75 



0.74 ■ 



G-O v/fi = 2.0, A/n = 4.0 
□-□ v/fi = 1 .0, A/n = 4.0 
-- u.,„ =0.7449 



0.73- 



0.724 
10" 



10 



1/N 



lO' 



10'^ 



FIG. 5: Convergence of the numerical results towards the theoretical value for two-parent recombination in the parallel (Kimura) 
model for the sharp peak fitness. In this example, A/fi — 4.0. 



As a second example, we consider the quadratic fitness landscape, f{u) = kv? 12. This smooth, continuous fitness 
function allows for the use of the exact analytical formulas Eq. ([5^ , . By maximizing Eq. ([5^ with respect to 
^c, when (j)c < 1 and hence g{4>c) = 0, we find 



(1) 



fl-a'-^ 

V kJ k 



(69) 



This mean fitness defines a selective phase SI. 

According to our previous analysis, when 0c = 1 and g{4>c) = 1, we maximize Eq. (|63p in ^c- Here, we consider that 
the order parameters ^j, and u have the same sign, > 0. We then have u£_c ^ in Eq. (j63p [5^. Hence, we find 



(2) ^ k/ 2ii\ 
2 I k J 



(70) 



which defines a second selective phase S2. 
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TABLE VI: Stochastic process versus analytical theory for two-parent recombination in the parallel model for the quadratic 
fitness f{u) = ku'^ /2, with k/fi = 4.0, u/fi = 3.0, and N = 100. 





_^stochastic 


_^analytic 


0.1 


0.7065 


0.7071 


0.3 


0.7052 


0.7071 


0.5 


0.7058 


0.7071 



By applying the self-consistency condition /, 



(1,2) _ 



fcu^/2, we find the following phases 



SI 

S2 
NS 



1 ^V- — 
k) k 



1/2 



— < - < 1 - 

/I k 



V k k 2 

0, otherwise 



2zy 



-,1/2 



(71) 



We note that the phase transition between case 1 and case 2 is exactly as predicted by Eq. We further note that 
the mean fitness is independent of v for v > i/^, = ji^ /{2k), exactly as predicted by Eq. (|66p . 



0.15 




0.05 



FIG. 6: Probability distributions for two-parent recombination in the parallel model for the quadratic fitness f{u) — 

with k/ fj, = 4.0 and v / ^ = 3.0, obtained from stochastic simulations with M = 10 000 sequences of A*' = 100 bases and different 

values of pc . 



The system of differential equations (j47[) provides an exact representation of the evolution dynamics for an infinite 
population, when uniform crossover probability = 1/2 is assumed. On the other hand, our analytical equations Eq. 
(f62|) . Eq. ((63|l for smooth fitness, or Eq. ([68]) for the discontinuous sharp peak, predict that the equilibrium results 
should be independent of the crossover probability Pc- To test this theory, we performed exact stochastic simulations 
based on a Lebowitz/Gillespie algorithm [s^, We generate a population of M — 10 000 sequences initially in 
the wild-type. The size of the finite population represented in the simulation was chosen large enough such that the 
results become independent of size M. Then, the population is evolved in time by point mutation, recombination and 
replication with rates proportional to /i, i/, and /(w') respectively, with u' = -i- X]^i ^] average composition of 

sequence Si, 1 < I < M. For that purpose, a list is generated by defining: t; = /i + + f{u'-), t = J2f=i '^i- With 
probability t;/t, a sequence 1 <l < M la chosen from the population to undergo either a single point mutation with 
probability h/ti, replication with probability f{u^)/Ti, or recombination with another sequence with probability v/ti 
according to the process described in Fig. [H 

To preserve the size M of the population, when replication or recombination is performed, a sequence chosen 
at random from the population is substituted with the offspring. The time increment after any of these events is 
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0.71 ' .5 -4 -3 -2 

10 10 10 10 

1/N 

FIG. 7: Convergence of the numerical results towards the theoretical value for two-parent recombination in the parallel model 
for the selective phase SI in Eq. (|7ip . In this example, k/^i = 4.0 and v/ ^ < 1/8. 




FIG. 8: Convergence of the numerical results towards the theoretical value for two-parent recombination in the parallel model 
for the selective phase S2 in Eq. (|7ip . In this example, k/y, = 4.0 and v / y > 1/8. 



performed is calculated as dt = —log(w)/{NT), with w G (0, 1] a uniformly distributed random number. The results 
obtained from this stochastic simulation are compared with the theoretical prediction in Table IVTIl for the sharp peak 
fitness landscape and uniform crossover pc = 1/2. 

In agreement with our theoretical prediction, as shown in Table IVII from stochastic simulations in the quadratic 
fitness landscape, the effect of recombination is independent of the polymerase crossover probability pc- The proba- 
bility distributions obtained for the systems considered in Table IVII are displayed in Fig. \E\ Clearly, the distributions 
are independent of Pc, in agreement with the theory. 

We obtain a direct numerical solution of the deterministic system of differential equations Eq. ()47p , which provides 
an exact representation of the evolution dynamics for an infinite population experiencing uniform crossover recombi- 
nation = 1/2. A comparison between these numerical solutions, and results obtained from the stochastic simulation 
for a system large enough to eliminate finite size effects, is displayed in Table IVIII for the sharp peak fitness. The 
theoretical prediction from the analytical formula Eq. (|68p is also shown for comparison. It is evident from this table 
that the effect of recombination is independent of the polymerase crossover probability pc, in agreement with our 
theoretical predictions. 

From the data presented in Table IVIII we notice that the deterministic system of differential equations provides 
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TABLE VII: Stochastic process versus differential equation for two-parent recombination in the parallel model for the sharp 
peak fitness, A/ ^ = 4.0, N = 400. 





^stochastic 




rjstochastic 
Jo 1 Pc — 


0.1 


rjstochastic 
Jo ) Pc — 


0.3 


r>stochastic 
J 1 Pc — 


0.5 


r>diffcq 

-'o 


panalytic 


0.0 


0.998337 


0.998336 




0.75017 




0.75017 




0.75017 




0.75016 


0.75 


1.0 


0.998329 


0.998326 




0.7455 




0.7454 




0.74591 




0.74544 


0.7449 


2.0 


0.998312 


0.998317 




0.7415 




0.7414 




0.74085 




0.74140 


0.7398 



TABLE VIII: Analytical theory versus numerical solution for two-parent recombination in the parallel model for the quadratic 
fitness f{u) = with iV = 800 and fc/^i=4.0. 





diffoq 


analytic 


0.0 


0.7499 


0.7500 


0.025 


0.7417 


0.7416 


0.05 


0.7329 


0.7331 


0.1 


0.7202 


0.7159 


0.5 


0.7091 


0.7071 


1.0 


0.7083 


0.7071 


2.0 


0.7075 


0.7071 


3.0 


0.7073 


0.7071 



an accurate representation of the underlying stochastic dynamics for the case of uniform crossover, pc = 1/2. Thus, 
the results obtained from the numerical solution of the deterministic system of differential equations can be fairly 
compared with the analytical theory. 

It is remarkable that the small, but finite, effect introduced by recombination in the structure of the quasi-species 
distribution for the sharp peak case, is not a consequence of the MuUer's ratchet phenomenon characteristic of 
finite populations. Indeed, the shift in the wild-type probability Pq due to recombination, as predicted from our 
analytical equation Eq. ((68)) . was derived from the system of differential equations Eq. (|47)) . which describes the time 
evolution of an infinite population. Moreover, this closed analytical result is in excellent agreement with the numerical 
solution of the system of differential equations Eq. (j47p . as displayed in Fig. [8] and Table IVlTTl A good agreement 
between our analytical and differential equation results, which correspond to the infinite population case, and the 
stochastic simulation is expected when the later is performed in a large enough population. We determined that for 
the parameters we consider, M = 10 000 sequences provides simulation results that are independent of the population 
size for the sharp peak fitness function, thus allowing for a comparison with the infinite population theory expressed 
by the differential equations Eq. (|47|) and with our analytical solution Eq. ((68)) . 

Notice that for the quadratic fitness, the analytical theory reproduces the differential equation results within 
0{N^^). The convergence towards the theoretical value as a function of the system size for parameters within 
the 5*1 phase defined in Eq. (|7T|) . is displayed in Fig. [3 and for the S2 phase in Fig. [51 

As a final example, we apply our analytical solution Eq. (|62p and Eq. ([63)) to study the square-root fitness, /(u) = 
fc-\/|?x[, as displayed in Table IIX[ where analytical theory and direct numerical solution of the differential equation 
agree to 0{N-'^). 

TABLE IX: Analytical theory versus numerical solution for two-parent recombination in the parallel model for the square-root 
fitness f{u) = fci/H", with iV = 400, 800, 1000 and k/fi^A.O. 



u/fi 


ydiftoq^ AT = 400 


^cliffoq^ AT = 800 


ydifloq^^ ^ 1000 


„ analytic 


0.0 


0.6527 


0.6525 


0.65249 


0.6523 


0.1 


0.6650 


0.6672 


0.6678 


0.6710 


0.3 


0.6686 


0.6697 


0.66993 


0.6710 


0.5 


0.6696 


0.6703 


0.67043 


0.6710 


0.8 


0.6703 


0.6707 


0.67073 


0.6710 


1.0 


0.6705 


0.6708 


0.67083 


0.6710 
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As shown in Tabic IIXI two-parent recombination in the square-root fitness landscape enhances selection towards 
sequences which are on average more fit, as observed by a slight increase of the average composition u, with respect 
to the case when recombination is absent. This effect, which was already observed for the square-root landscape in 
the presence of horizontal gene transfer, can be attributed to the negative (see Fig|T|) epistatic interactions introduced 
by the square- root fitness, in agreement with the mutational deterministic hypothesis, Appendix [Nl 

An additional interesting effect in two-parent recombination, which was observed in the quadratic as well as in the 
square-root fitness landscapes, is the presence of spectral rigidity: the effect of recombination becomes independent 
of the recombination rate for i/ > 0. 

In summary, from our generalization of the parallel or Crow-Kimura model for an infinite population of evolving 
sequences Eq. ([M)) . we conclude that two-parent recombination introduces a mild mutational load over discontinuous 
fitness functions, such as a single sharp peak, and thus it can shift the error-threshold transition. For smooth fitness 
functions, the effect of recombination depends on the sign of epistasis (see Fig. [!}, in agreement with the mutational 
deterministic hypothesis [§, [13, [H, [31 ■ We show this analytically in Appendix [Nj 

In contrast with horizontal gene transfer, recombination affects the structure of the quasi-species (and the error 
threshold transition) for a sharp peak fitness. We believe that this fundamental difference between horizontal gene 
transfer and recombination is because of the fact that the latter can generate a much larger diversity in the offspring 
per recombination event. Hence, the diversity barrier that, as previously discussed in section II, is imposed by the 
sharp exponential distribution in the sharp peak case can be tunneled through due to the more radical mixing effects 
of two-parent recombination. Our analytical theory, which provides explicit expressions for the mean fitness /„ and 
average composition it, is developed in the realistic regime {N — > oo), considering that typical viral genomes are 
iV - 10^ - 10''. 



III. THE EIGEN MODEL 

In this section, we present a generalization of the classical Eigen model [l^, [2^, [2^, including the exchange of 
genetic material between pairs of individuals in an infinite population (49j , 

= ^ [BijCjkTk - SijSikDi]qk (72) 

Here, recombination as well as mutation are considered to be coupled to the replication process. Recombination is rep- 
resented by the coefficients Cjk, which in general will be functions of the frequencies qk, Cjk ~ Sjk+J2i li^li/ Sfc' Ik' ■ 



A. Horizontal gene transfer of non-overlapping blocks 



In this recombination scheme, we consider the exchange of blocks of genetic material between pairs of individuals 
in the population. We consider the blocks to be non-overlapping, such that we have N/M of them. We define a block 
index < 6 < N/M — 1, and a site index within each block to be Mb +1 < jb < M{b + 1). For this process, we have 
that the nonlinear recombination term in the differential Eq. (|72[) is 



C 



jk 



N/M 

N/M-l 

X n 



k + 



ly/M 
N/M 



6=0 



M(6+l) 



n u. 



jt=Mb+l 



1 + U{jb) 



1 - "(^6) 



n ^sL. 



(73) 



The recombination operator representing this process, assuming the recombination rate per block to be v /M , becomes 



R 



n 

6=0 



v/M \ 
N/M 



n 



v/M 
N/M 



n 



(74) 



Here, we defined the single-site recombination operator as Rj = aJ {j)Da{j), with the matrix D defined in Eq. 
We consider the large N limit, while keeping N/M ~ 0{N). Then, the recombination operator defined in Eq. ([73]) 
becomes, to order 0{N'^^) 



R 



e A-f e 



(75) 
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1. The Hamiltonian 



The Hamiltonian operator for the Eigen model, including the horizontal gene transfer process described by the 
operator Eq. (|74[) is given by 



H = Ne 



1 ^ - 



Nd 



1 ^ ^ 



(76) 



The microscopic fitness function is f{u) and degradation function is d{u). Here, the matrix D is defined as in Eq. 
We introduce a Trotter factorization of the evolution operator, in the basis of coherent states 



= lim 

M— oc 



M 



Zk 



.k=l 



\zm) 



As shown in Appendix |Hl the partition function is 

J [DiV£Vf}Dr]D^V(f\ 

Here, the action is defined by 



(77) 



(78) 



/ dt 

10 



<^ -nri-^4> + e-^(i-")-^/^""^+^**/(0 - d{0 



NlnQ 



(79) 



2. The saddle point limit 



We consider the saddle point limit of the action defined by Eq. ((79|) . In the saddle point limit, for long times, the 
trace defined by Eq. (jHlip becomes 



InQc 



lim ^ = ^ + [Mc + u4>c) + (77c + 

t^oo t 2 



,1/2 



(80) 



In this saddle-point limit, the action is given by 
lim — — = lim 



N,t->oc Nt t^oo Nt 



max <; /(Cc)e"^'^""^^"^+^'' 



c YcYc 



,1/2 



(81) 



As shown in Appendix |T] the mean fitness, defined from the saddle point action /,„ = limjv.t^oo hiZ/Nt — —Sc/Nt, is 

U = _ max^^ {e-Mi-.e«.)]-^{i-[0.(«.)]^^}^(^^) „ (82) 

Here, the expressions (/)c(Cc) and 7?c(Cc) are given by 

M+f(l-^.2)0f-i 



1/2 



(83) 
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TABLE X: Analytical results for horizontal gene transfer in the Eigen model for the square-root fitness f{u) = fcy^[u[+ 1, with 
M = 3. 



K 




analytic 


Q n 
o.U 


U.U 


U.004D 


o.U 


U.o 


u.ooyo 


o.U 


u.o 


u.o^zz 


o.U 


1 p; 

1.0 


U.04DD 


o.U 


U.U 


U.OOOO 


o.U 


u.o 


U.OD^Z 


5.0 


0.8 


0.3667 


5.0 


1.5 


0.3713 


8.0 


0.0 


0.3741 


8.0 


0.5 


0.3796 


8.0 


0.8 


0.3822 


8.0 


1.5 


0.3869 



^2. 



4 



,A/-ll 



-,1/2 



(84) 



The average eomposition, u, is obtained from the self-consistency condition = /(u) — rf(u). 

Eq. (|82[) is an exact analytical expression for the equilibrium mean fitness of an infinite population of evolving 
sequences. This analytical expression is valid for arbitrary permutation invariant replication rate f{u) and degradation 
rate d{u). 



3. Examples 

We consider first the quadratic fitness case, /(u) = ku^ /2 + k^. By expanding the formulas Eqs. ([82)) . ([83)) and ([M]) 
near the error threshold ~ 0, m ~ 0, we obtain the phase boundary from the critical condition 

fccrit = M^o } (85) 
1 + v/2ii 

We notice that the phase boundary is qualitatively similar to the horizontal gene transfer process analyzed in section 
II. A 2, Eq. for the parallel model. As in this former case, we notice that horizontal gene transfer introduces a 
mild mutational load against selection for a smooth fitness (i.e. quadratic). 

As a second example, we consider the square-root fitness landscape f{u) = kyj\u\ + 1. In Table 1x1 we evaluate our 
analytical Eqs. ([5^1M|) for this particular case. 

From the results displayed in Table 1x1 we notice that horizontal gene transfer increases the average composition u 
and therefore the mean fitness of the population. This effect, which is attributed to the negative epistasis introduced 
by the square-root fitness (see Fig. [T]) , is in agreement with the previous examples studied in the case of the parallel 
model, and with the mutational deterministic hypothesis [tI. [lOl. [ill, [l^ . as we prove in Appendix IMl 

As a third example, we consider the sharp peak fitness /(u) = {A — Aq)5u.i + ^o- In this case, the maximum in 
Eq. ((821) corresponds to = 1. From Eqs. (US]) and we have = (1 + u)/2, r^c = 0, and hence after Eq. ([82]) 



U =Ae ^ (^)"] (86) 

The error threshold is given, for u = in Eq. ([86]) . by the condition Ae^^^^i^^"^!"^ ' > ^o- However, we notice 
that /,n(it = 1) = Aer^'- > ]m(n = 0). Hence, in the selected phase we have u = 1 — 0{N^^). The fraction of the 
population located at the peak Pq is obtained from the self-consistency condition f„i = APq + ylo(l ^ ^o) 

^° - A-Ao ^^^^ 
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After Eq. (|87p . we find the true error threshold at A^rit = ^oe^', while the condition Ae~^~~^^~'^ ^ > Aq represents 
the limit of metastability for initial conditions with u ~ 0. We notice that this result is similar to the exact solution 
in the absence of horizontal gene transfer [s^. Hence, as previously discussed in section LA. for the parallel model, 
we conclude that horizontal gene transfer docs not affect the structure of the quasi-spccics for a discontinuous, single 
sharp peak fitness. 



B. Horizontal gene transfer for multiple-size blocks 



In analogy with the model treated in Section II. B, we consider the natural extension of horizontal gene transfer of 
blocks with multiple size, with average (M) and {M)/N — 0{N~^). Following a similar analysis as in the derivation 
of Eq. (|25p . we define the recombination operator for multiple-size blocks as 



1. The Hamiltonian 



We consider horizontal gene transfer to be coupled to the replication process. Moreover, we will consider that when 
replication occurs, a horizontal gene transfer event also occurs with a probability < v/{M) < 1. The Hamiltonian 
operator for the Eigen model, including the horizontal gene transfer process described by the operator Eq. (|88p is 
given by 



(M) (M) 



X/ 



1 ^ ^ 

J^^a''{j)<J3a{j) 



3 = 1 



Nd 



1 ^ ^ 



i=i 



We introduce a Trotter factorization 



-Ht 



M 



M 



lim / [Dz*Vz\\zm) T\{zk\e- 



■eHl 



Zk-1 {Zo\ 



As shown in Appendix [jj the partition function is 



Here, the action in the continuous time limit is 







+e 



-A'(i-f))ri _ 



(M) (M) 



(89) 



(90) 



(91) 



(92) 



2. The saddle point limit 

The saddle point limit is exact as ^ oo in Eq. ([92]) . After a similar procedure as in Section 3. A. 2, we find the 
saddle point equation for the mean fitness 

/™ = _m- ^ {e-(--)[l - ^ + ^e-(^^)(-^^)]/(e.) - .(C.)} (93) 
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Here, the fields rjc and (f>c are expressed as functions of 



(M) 



1 - 



(M) 



{A/)(l-0e) , 



1 - 



1/2 



(94) 



0c(fc) 



,(M)(l-0e 



i/(l-ti^) 
2/i 



1 - 



(M) 



2m 



4^2 



1/2 



(95) 



Equations (|93p - (|95p represent an exact analytical solution for the equilibrium mean fitness of an infinite population 
experiencing horizontal gene transfer of variable blocks size. This expression is valid for arbitrary, permutation 
invariant replication rate f{u) and degradation rate d{u). 



3. Examples 



We consider first the sharp peak fitness f{u) = (A — Ao)Su,i + Aq. In this case, the maximum in Eq. ([55)1 is at 
= 1- From Eqs. (jM]) and ([95)) . we obtain rjc = and (l>c = {I + u)/2. Substituting these values in Eq. ([93)) . we 
obtain for the mean fitness 



-(Af>(l-«)/2 



(M) (M) 



A 



(96) 



The error threshold for 



-(A7)/2 



> Aq. 



- is obtained from Eq. ([961) by the condition Ae ^ 1 

However, we notice that f„i{u = 1) = Ae~^ > fm{u = 0). Therefore, in the selected phase the average composition 
M = 1 — ©(iV"^), and the effect of recombination becomes negligible for the sharp peak fitness. The fraction of the 
population located at the peak Pq is obtained from the self-consistency condition /,„ = AP^ + Aq{1 — Pq) 



Ae-^ - Aq 
A-Ao 



From this expression, we find that the true error threshold for the sharp peak fitness is Ac 
Ae 



1 



(M) + (Af)'^ 



(97) 



e^^Ao, with the condition 



0. 



> Aq representing the limit for metastability for initial conditions with u - 

As a second example, we consider the quadratic fitness f(u) = kv? /2 + k^. An analytical expression for the phase 
boundary is obtained from Eqs. 193]) . fM]) and ([95)) near the error threshold 0, u ~ 0. We find 



(98) 



1 + ^ 

fccrit = Aifco 



1 + — 



1. Analytical results, as obtained from Eqs. 



For small v, the critical value is fccrit ^ ko{^, -\- v / 2) . 

As a final example, we consider the square-root fitness f{u) = 
([M)) - ([^5|) for this case, are presented in Table IXll 

We notice that the results obtained for the horizontal gene transfer process with variable block size agree with the 
corresponding ones when the size of the recombination blocks is fixed. We recall that this correspondence was also 
observed and discussed in the previous section for the parallel model, so similar arguments apply to the Eigen model 
as well. An analytical proof is provided in Appendix [Ml 



C. The Eigen model with two-parent recombination 



For the Eigen model, we introduce the recombination process described in Section H.C and illustrated in Fig. 
131 which considers the exchange of genetic material between pairs of sequences due to crossovers governed by the 
polymerase switching from one parental chromosome to the other with probability pc per site. For the Eigen model, 
mutation and recombination are considered to be coupled to the recombination process, as stated in the generic 
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TABLE XI: Analytical results for horizontal gene transfer in the Eigen model for the square-root fitness f{u) = k^/\u\ + 1, 
with (M) = 3. 



k 


1/ 


^analytic 


3.0 


0.0 


0.3346 


3.0 


0.5 


0.3409 


3.0 


0.8 


0.3450 


3.0 


1.5 


0.3546 


5.0 


0.0 


0.3588 


5.0 


0.5 


0.3654 


5.0 


0.8 


0.3695 


5.0 


1.5 


0.3794 


8.0 


0.0 


0.3741 


8.0 


0.5 


0.3809 


8.0 


0.8 


0.3851 


8.0 


1.5 


0.3950 



differential equation Eq. ([72|) . We will consider that during replication, a sequence can recombine with probability 
1/ < 1, OT just replicate without recombining with probability 1 — v. This process is represented by the coefBcients in 
Eq. ^ 



C,k = (1 - i^)<5,- + ^ 

{a„=±l} 

1=1 n=l 



n=2 



(99) 



Here, again, pi =■ qi/ 1i normalized probability for the sequence 1 < ^ < 2 

odel Hamiltonian by the 

\l^v)i + vg[{d){j)D]^{m 



In the spin Boson representation, we express the Eigen model Hamiltonian by the operator 

N 

N 



X./ 



1 " ^ 

-jT^o) {j)a^a[j) 



Nd 



1 ^ ^ 



(100) 



Here, was defined in Eq. ([37]), and the matrices were defined in Eq. 

factorization 



-Ht 



M 



Imi^j [Vz'Vz\\zm) [ \{{zu\e-'"\zk-i) ) (fo| 
As shown in Appendix |Kl the partition function is 



Wc introduce a Trotter 



(101) 



Z = V£V^VfiV'qV(j)D(j)e 



^S[l.i,f)..-q.i,4,] 



(102) 



Here, the action is defined by 



-N f dt' [-^^ -nv-H 

+ 6-^(1-") (1 - + i^5(0))/(O - did -NlnQ 



(103) 
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The saddle point limit 



For long times, a steady state condition is achieved. Then, the fields become time-independent, and we have 



lim ~ — 

t— >CX3 t 2 



,-| 1/2 



(104) 



We look for the saddle point solution from the action 



iim — lim 



N,t^oo Nt t^oo Nt 



max 



-I 1/2 



(105) 



Because of the singular behavior of the function g{(t>c), to find the saddle point we need to consider three separate 
cases: (pc < 1, = 1, and (/)c = 1 — 0{1/N). We notice that the saddle point analysis may not apply exactly, unless 
ff(</'c) = S^^^i. 

Case 1: 0c < 1- The mean fitness is given by 



(1) 



max 1(1 

-i<Cc<i 



,),-4^-VT^]/(ec)-d(Cc)} 



(106) 



We note 
Case 2: 



is still given by Eq. ([61]) . 

()c = 1. The mean fitness is given by 



(2) _ 



max |e 

-l<?c<l 



772 



/(Cc) - di^} 



(107) 



Case 3: (pc = 1 ~ 0{1/N). In this case, additional analysis is necessary to calculate the mean fitness due to 
the singular behavior of the gicpc) function. For a smooth fitness function, we can argue this case does not exist. 
We first consider the Hamiltonian pOOp for the case g = 0. In this case, the fitness function is simply multiplied 
by (1 — i/). If the degradation function is zero, the largest eigenvalue, fm is simply multiplied by (1 — i/) relative 
to the V = case. Without degradation, this result allows us to calculate the average composition, u*, from the 
implicit relation fmii^) = (1 ~ v)fm,{v = 0) = fius^). With a non-zero degradation function, the equation for 
fm{v) will be a bit more involved. Alternatively, if we consider the differential equation for the unnormalized class 
probabilities, dQ/dt = LQ, we see that the differential operator L looks like that in the absence of recombination, 
save for a multiplication of (1 — v) in the fitness function. Thus, the variance of the population is given by (ssj 
c'^/N = 2^u*(l — v)f{ui,)/\N{{\ — !/)/'(«*) — d'{u^,))]. Considering more carefully the g function, we find as before 
this term is exponentially negligible compared to the —i'P{u) term when cr^ < f^- In other words, we must strictly 
be in case 1 when 

1 - < 2//M,(l - V)f{u^)/[{1 - V)f{u^) ~ d'{u^)] 

We denote the value of v at which 



(108) 



l-ul = 2/iu*(l - i/)/(u,)/[(l - v).f'{u.^) - d'(u*)] at v = v^^ 



(109) 



as Vi,. Now, at this value of v^, we have J duidu2Rl^-^u^P{ui)P{u2) = P{u). Thus, the term proportional to v in 
Hamiltonian (jlOOp exactly vanishes. Thus, we have dfm/dv = and dP{u)/dv = at this value of v. There is 
spectral rigidity. This result implies that for > the distribution P{u) is independent of and that the value of 
M* is constant. In other words, the value of fm in case 2 must be constant with v. Assuming /,„ varies continuously 
with V in case 1, and that the fitness values for case 1 and case 2 are equal at a single value of v, which mathematically 
may be negative, case 2 is simply case 1 with the value v = 



(110) 



Equations (|106p . (|107p constitute an exact analytical expression for the equilibrium mean fitness of an infinite pop- 
ulation of sequences evolving under the dynamics of the Eigen model, and experiencing two-parent recombination. 
These equations are exact for a smooth, permutation invariant replication rate f{u) and degradation rate d{u). 

For a non-smooth fitness function, additional analysis is necessary, since f {u,,) — d'(u^) is undefined, and P{u) may 
no longer be Gaussian. 
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2. Examples 



We investigate the phase diagrams, as predicted from our theoretical equations, for two different fitness functions: 
A sharp peak and a quadratic fitness landscape. 

As an example, we consider the sharp peak fitness, f{u) — {A — AQ)du.i + ^o- The maximum is obtained at = Ij 
M = 1 - 0{N-^). From Eqs. p07)) and (fT06)) wc have 



(2) 



Ae-" > = (1 - ,y)Ae- 



(111) 



Hence, for the sharp peak fitness a single selective phase is observed. In this case, the function g{(j>c) is not exactly 
a Kronecker delta ^</,^,i, we are in case 3, and then we expect to observe a small correction, approximately linear 
in 1/ from the prediction of the saddle point analysis. By considering the differential equations for the sharp peak 
case at zeroth-order in v, we find that the class distributions satisfy e^^/^ ^j,(rfe/A^)P^°^/2'^ — 



with Pr 



(0) 



(Ae-^ - Aq)/{A - Ao) and /, 



(0) 



AP^°'> + Ao(l - P^"') = Ae-^'. Thus we find S = J^iPi"'/^' 



(0) /oi 



{A - Ao)Po^''^e"^/V(/m' - Aoe"^/2) ^ (^g-M _ Ao)e~''^'^ / {Af-'' - Aoe-^/^). Thus, we find the recombination term 
^}^(rk/N)P^^ /2^^iPI'^'' = Aer^l'^S'^. Hence, wc find that at first order in v, the fraction of the population 
located at the peak is given by 



Ae-^ - Aq 
A~A. 



A- A. 



Ae~^'^ 



Ae-^" - An 



(112) 



We note that this value of /,„ = AP^ + Ao(l — Po) interpolates between fin' for Ae /Aq = 1 and a value intermediate 
to fln^ and fln^ for Ae-^'/AQ = oo. 

As a second example, we consider the quadratic fitness f{u) ~ kv? /2 + k^. By maximizing expressions Eq. (|107p 
[59| and Eq. (|106p . we obtain two selective phases SI and S2, and a non-selective phase NS, defined by the equations 



51 : u 

52 : u 
NS: u 



1/2 



< min(i^,, j/c) 



1 — 2/ifco/fc 



-,1/2 



1 + M 
0, otherwise 



(113) 



where in the SI phase 



and we have defined 



= 2[v/l + A^2(l + 2fco/fc)-l - A^'fco/fc]//^' 



= l-^e4^-^]/(e,V2 + Vfc) 



(114) 



(115) 



where is given by Eq. pi4p . The phase structure is defined by the conditions: For 2/ifco/fc > li the system is in 
SI if I' < Vci or in NS if > Vc] for 2/ifc'o/fc < 1, the system is in Si if < v^,, or in S2 if > i/*. From Eq. (jll5[) . we 
notice that at 2/ifco/fc = 1; Vc = v^. 

Wc note that the phase transition between case 1 and case 2 is exactly as predicted by Eq. (|109p . We further note 
that the mean fitness is independent of for > t/, , exactly as predicted by Eq. (|110p . 

As a final example, we consider the square-root fitness f{u) = fcy^juj -|- 1. By maximizing expressions Eq. (|107p 
[sot and Eq. (|106p for the square-root fitness landscape, we obtain the results presented in Table IXIIl From the results 
displayed in Table IXIIi we observe a similar qualitative behavior as in the two-parent recombination for the parallel 
case. Table IIXI In the square-root fitness, recombination introduces a favorable effect over selection, which can be 
attributed to negative epistasis (see Fig. [T|) according to the mutational deterministic hypothesis [tI. fiol fill, fl^ . as 
shown in Appendix [Ol Spectral rigidity is also observed in this case when v > 0. 
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TABLE XII; Analytical results for two-parent recombination in the Eigen model for the square-root fitness f{u) = k^/\u\ + 1. 



k/)i 


u/fi 


„ analytic 


4.0 


0.0 


0.3493 


4.0 


0.1 


0.3892 


4.0 


0.2 


0.3892 


4.0 


0.5 


0.3892 


3.0 


0.0 


0.3346 


3.0 


0.1 


0.3892 


3.0 


0.2 


0.3892 


3.0 


0.5 


0.3892 



IV. CONCLUSION 

We have generalized two classical models of evolutionary biology, the Crow-Kimura and Eigen models. We have 
introduced inter-individual transfer of genetic information to these models, bringing them closer to the modern 
understanding of evolutionary biology. For both models, we showed how to incorporate horizontal gene transfer. 
We showed that these generalized models may be written in an equivalent field-theoretic formulation. This mapping 
allows us to apply the powerful mathematical techniques of quantum field theory to obtain exact analytical solutions. 
For fitness landscapes that depend only on distance from a wild-type genome and for long genome lengths, we are 
able to solve for the mean population fitness for arbitrary functional forms of the fitness. Horizontal gene transfer 
of M genetic units was shown to be analogous to horizontal gene transfer of one genetic unit, with a suitably scaled 
horizontal gene transfer rate. 

We also showed how to incorporate recombination to these classical models, as might occur in viral super- or co- 
infection. This case seems at first glance far more non-linear, since on average half of the genetic material is taken 
from each parent to make the child, rather than 0(1) genes as in horizontal gene transfer. Somewhat surprisingly, we 
were able to exactly solve the two-parent recombination case for both the Eigen and Crow-Kimura model as well. In 
the limit of a long genome and for fitness landscapes that depend on the distance from a wild-type genome, we find 
that the mean population fitness is independent of the average cross-over length in the recombination process. We 
also find two selected phases. The phase for large recombination rates is spectrally rigid, with the mean fitness and 
population distribution independent of the rate of recombination. 

We proved the mutational deterministic hypothesis holds for horizontal gene transfer or recombination in both the 
parallel (Kimura) and Eigen models. That is, horizontal gene transfer and recombination reduce the mean fitness 
in the presence of positive epistasis and increase the fitness in the presence of negative epistasis (see Fig. [T] and 
Appendices [H IM El and [0|) . 

For a discontinuous, sharp peak fitness landscape, we found that horizontal gene transfer does not affect the 
structure of the quasi-species distribution or the error threshold transition. For the sharp peak fitness function, the 
only appreciable effect of horizontal gene transfer is related to the potential emergence of metastability depending 
on the initial conditions, and we analytically determined the region of parameters space in which this situation 
may occur. On the other hand, even for the sharp peak fitness function, two-parent recombination induces enough 
mixing to enhance diversity in systems evolving under a sharp peak replication rate, thus changing the quasi-species 
distribution and shifting the error threshold transition. We found explicit analytical expressions for this shift. 

For smooth fitness landscapes, these genetic transfers affect the steady-state population distribution and mean 
fitness. Recombination and horizontal gene transfer may, of course, dramatically change the dynamics of the evolution 
process as well. The most dramatic impact of these exchanges of genetic material is expected for fitness landscapes 
that have a correlated, biological structure that is conjugate to these exchanges Analytic investigation of such 
correlated fitness landscapes is perhaps one of the next steps in the development of modern theories of evolution. 
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APPENDIX A 



We consider Eq. ([9]) for horizontal gene transfer of blocks of fixed length A/ in the parallel model. For e — t/M and 
Af oo, we have 



(zfe|e-^^|4-i) 



(zfe|4_i) - e{zk\H\zk-i) ~ {zk\zk-i)e 



. (?fc|g|?fc_i> 
(«'fci«'fc-i) 



(Al) 



For the Hamiltonian matrix elements in the coherent states basis, we obtain to order 0{N^) 



{zk\H\zk-i) 

{Zk\zk-l) 



= Nf 



1 " 

-^^zl[j)aiZk-i{3 



i=i 



N 

■ iJ'^[z*k{j)(^iZk-i{i) - 1] 



N/M-l 

- E 

6=0 



M(6+l) 

n mb)Dzk-i{jb)-i 

jt=Mb+l 



(A2) 



We introduce the auxiliary field 



1 ^ 

Ck = zl{j)<T3Zk^l{j) 



(A3) 



and the conjugate field ^k to enforce the constraint via a Laplace representations of the delta function. Substituting 
into Eq. (M\i into Eq. we obtain 



-Ht 



lim / iVz'Vzl 



M 



n 

Lfc=0 



ieNd^kd^k 



2tt 



?m){zo\ 

xe^fc=i^f=i{~i/2K(i)-i*fc(i)+?:_iO)-?*,_iO)-2i^O)-?j,_i(j)]+e[i'-^ 



(A4) 



The contribution of the interaction term j^Ylb=o^ ^ Y\^l-Mb+i^i^b)Dzk^i{jb) to the partition function can be 

treated to arbitrary order in perturbation theory using the formula Z = ZQ{e^^^)o, and its contribution shown to be 
site- independent. Moreover, this reference perturbation theory has 0{N^^) fluctuations. Thus, it can be shown that 
with an error 0{M /N) at all orders in perturbation theory, we obtain the same partition hmction when substituting 

this interaction term by jj I •^(i)-Dzfc-i(j) ) . Therefore, we define the auxiliary field 



1 ^ 

j;^'^z^ij)Dzk-i{j). 



(A5) 



We obtain the partition function from the trace of the evolution operator, Eq. (|A4[) . projected onto physical states 



Z = Tr 



N 



3 = 1 



2tt 



lim 



M 



n '^4'Dzk 



-S[z',z\ 



(A6) 



By inserting Eq. (jA5|l . we obtain 



r2TT 



X 







ndXj 



27r 







' M 


iXj 


I 


l[VzlVzk 






.k=0 



(A7) 
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The matrix S{j) in Eq. (jA7p is defined by 



S{j) = 



I 








A2 


/ 








-A3 


/ 
















V ... -Am I J 



Here Ak = I + e(CfcO-3 + A^o-i + cjykD). 

After calculating the Gaussian integral over the coherent state fields, we obtain 



lim / n 



dX 
"2^ 



27r W 



M 



=-Ef=iEt'i=i?;:o)s.,(j)?,o-) 



lim / Y\^e-'^ndetS{j)]-' 
M^^J^ ii 27r ^ ^^^J 



i=i 

27r N 



M^oo /n 27r 



i=i 



= lim rrTrfe^^"=i(«^"^+^"i+'^^^) =Q^, 



where T is the time ordering operator and 

Q = Tr Te^° dt'{^a3+^lal+cj^D) 

With this result the partition function in Eq. (|A7p becomes Eq. ([TT 



(A8) 



(A9) 



(AlO) 



APPENDIX B 



From Eq. (|13p . we obtain the saddle-point equations with respect to the fields £_c: 4>c for horizontal gene transfer of 
blocks of fixed length M in the parallel model: 



\ Nt 



2^c + m 



1/2 



= 



(Bl) 



5 f-Sc 



S(j>c V 



Then, the system of Eqs. (|B1[) and (|B2p reduces to 



Cc + f^ 



1/2 



0. 



(B2) 



1/2 



(B3) 



^c(^c + U0c) + ( M + 



1/2- 



(B4) 
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We eliminate ^c, <j>c, to obtain 



max <j f(£^ - u - ^ + + J 2(f)c - 1 - <c) 



2m1/2 



Finally, wc look for an cxtrcmum in 



-Sc 



2pi 



fi\u\ 



0. 



We solve for (pc as a function of from this equation 



-I 1/2- 



Substituting into Eq. jBSj , we obtain for the mean fitness or average replication rate Eq. ([T 



(B5) 



(B6) 



(B7) 



APPENDIX C 

We consider Eq. (pS)) for horizontal gene transfer of blocks of variable length in the parallel model. For e = t/Ad 
and il/ — > cx), wc have 



(?fc|g|?fc-i> 

{zk\e^'"\zk-i) ~ {zk\zk-i) - e{zk\H\zk-i) ~ (4|4-i>e ^^^TC^. 
For the Hamiltonian matrix elements in the coherent states basis, wc obtain 



{zk\H\zk-i) 

(Zfc|ffc_l) 



= Nf 



We introduce the fields 



1 " 

j^'^zl{j)a3Zk-iij 



i=i 



N 



1 ^ 



V 



(M) 



(CI) 



(C2) 



(C3) 



1 ^ 

j;^^K{j)Dzk-i{j) 



(C4) 



and the conjugate fields 4>k and ^j. to enforce the constraints via Laplace representations of the Dirac delta functions. 
Substituting into Eq. ([^5]) . we obtain 



-Ht 



M— too 



lim / [Vz*Vz\ 



M 

n 



ieNd^kd^k ieNd(t>kd(t>k 



27r 



27r 



\zm){-M 



xe 



35 



We obtain the partition function from the trace of the evolution operator Eq. (|C5[) 
Z = Tr [e--^*pj 



N 



27r 



hm 



M 



n ^^^^fe 



.fc=i 



-S[i",z| 



By inserting Eq. (jC5[) . we obtain 



Jo 



AT 



977 



27r 



The matrix S'(j) in Eq. ((U7)) is defined by 





" M 




/ 


n ^^^^"fc 








.fc=i 




/ 





... -e^^^^i \ 




-As 


/ 










-As I 



















... - 


Am I J 





2Af =e'-^zo 



where Ak ^ I + e(^fccr3 + ficji + (t>kD). 

After calculating the Gaussian integral over the coherent state fields, we obtain 



i=i 

2n N 



M 



n '^^k'Dz, 



1™ / ni^^-^'MdetSOT^ 



27r N 



hm / n— 

L J-l 27r 



g-jA J g-^Tr ln[/-e*^J f Gxp(£ j:;!^^ «fcT3+MTi+0fc-D)] 



hm TT Tr fe'T.tUi^-^+f^-^+^^D} ^ qN 



where 



With this result, in the limit M oo, the partition function in Eq. (jC7p becomes Eq. 



(C6) 



(C7) 



(C8) 



(C9) 



(CIO) 



APPENDIX D 



We consider recombination in the parallel model. For the Hamiltonian matrix elements in the coherent states basis, 
we obtain to order 0[N^) 



{zk\H\zk-i) 



(Zk\Zk-l) 



Nf 



1 ^ 



N 



+j.iV(.g[{zlO-)^'zVi(i)}]-l) 



(Dl) 
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where the matrices are defined by Eq. ((38)) . We introduce the auxihary fields 



1 " 



(D2) 



and the conjugate fields to enforce the constraints via a Laplace representations of the delta functions. Substituting 
into Eq. (|40p . we obtain 



e-"* = lim / \Vz*Vz\ 



M 

n 

,/£ = l 



leNd^kd^k 
2ti 



\zm){M 



y^^T.'^LiT.^=i{H^/mi(])-^k(3)+r^-^U)-Zu-i{3)-2r^{3) 
y^^-'^j:"=i[^kik+pi+u-f{(,k)-va{{si{j)D\zk^t{j)})\ 



We obtain the partition function from the trace of the evolution operator, Eq. (|D3p . for recombination in the parallel 
model 



Z = Tr 




-L 









2ix 



N 






" A/ 


n 


lim / 


n ^^i^^^'^ 











It is convenient to define the auxiliary field 



1 ^ 



(D4) 



(D5) 



and the corresponding (pk to enforce the constraint by a Laplace representation of the Dirac delta function. From Eq. 
(lD4l). we have 



27T 



J--'- 27r 







' M 


iXj 


I 








.k=l 



(D6) 

Here, for large N the function g{(f>) has the singular behavior g{(j)) ~ unless = 1 — ©(l/A^). We also notice 
5(1) = 1. The matrix S{j) in Eq. 06^ is defined by 



( 


/ 








... ~e'^^Ai\ 






/ 















/ 
















V 










-Am I ) 



(D7) 



Here, Ak ^ I + e(^fcCT3 + jiai + ipkD). After calculating the Gaussian integral over the coherent states fields, we obtain 



27r N 



lim / TT — 



i=i 

27r JV 



A/ 



n ^^^^"fc 



jiui n§»-"M<ie.5(i)i- 



J 

27r JV 



hm / TT 



dX 



i=i 



27r 



i g-iAjg-Trln[/-e'^jf cxp(£ EfcL 1 Cfc'73+A"Ti+0fc£l)] 



N 



lim TT Tr f e^^" = 



(D8) 
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where in the continuous hmit 

With this result, the partition function in Eq. (|D6p becomes Eq. (|59p . 



(D9) 



APPENDIX E 



The recombination operator 



For the recombination process, we consider that in the first step, the polymerase enzyme starts the copying path 
in either of both parental chains with equal probability 1/2. Then, at each site, it can jump to the other chain with 
probability < Pc < 1/2 or continue along the same chain with probability 1 — pc- 

As presented in Section II. C, this process is represented in the general differential Eq. ([T]) by the coefficients in Eq. 



Rl 



1 



1 + s5:'si 



{a,=±l} 
X[(l 



1 + s[s\ 



1 + Qie»2 f 1 + S2S2 



X [(1 - Pc) 2 Pc ^ ^ ' 



l + °3 



1 + 44 



1 + 44 



X ... X 1 - Pc) = Pc 



The operator for this process in the Schwinger-boson representation is presented in Eq. ((3^ 



(El) 



2 

^ = IJ2p^ E [/i'^^Ki)'^]x[(i-Pc) 



1+0102 

^ Pc ' . 



;=i {q,=±i} 

+ , ^ . 1-02 ■ 



x[/2-^i?K2)-^] X [(1 -pj^^pr^] X [i^Ms)^ 
x...x[(l-p,) 2 - ]x[/^^ Ri{N) — \-I 



Here, we define the single-site recombination operator as i?;(j) = {j)D^^a{j\ with 

/ l + .s' \ 



2 i 2 



(E2) 



(E3) 



2 .... /V 

and p/ = 'i'i/X];^! 9' ^'^'^ normalized probability for sequence 1 < ' < 2 . 

It is possible to group the different terms in the form of Ising-like traces, by using the definition J = —(1/2) ln[pc/ (1 — 

Pc)], 



5({i?;0-)}) = ^[2cosh(J)] 



N 



-(AT-l) 



l=\ {«j=±l} J=l 



1 + a,- - 1 — a, - , , 



(E4) 



After the representation in terms of coherent states fields, we have Ri{j) z^{j)DjZk-i{j) = ipj, and correspondingly 
9 5({^'}) 



N 



-{N-l) 



1=1 {^^=±1} j=i 



1 + a, 1 — a, , , 

9 9 



(E5) 
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It is convenient to reorganize this expression as 



N 



<?({^'}) = ^[2cosh(J)]-(^-i)5]p,n 



1=1 3 = 1 



We define the transfer matrix 



1 + ^] 



E 

{a,=±l} 



N 

n 



1 + a 



' 1 + V ■ 



(E6) 



T 



e 



(E7) 



with eigenvalues A+ = 2cosh(J) and A_ = 2sinh(J). 
The Ising trace in Eq. (|E6p is given by 



{".=±1} 



^ (K|T^-Vi) + ("i|T^-V«i)) 

{ai=±l} 

Tr[r^-i] +Tr[T"-Vi] 



2A 



N-l 



2 [2 cosh(J)] 



By considering this formula, and expanding the product in Eq. (jE6[) . we obtain 



(E8) 



2 ^ /l + V/' 



N 



■0: 



N 

E 

J l<A;<r) 



1 + Vl. 1 + V'L 



-, ^ , 1 + V"!. 1 + V'm 1 + V'ri 

l<A:<m<n ^™ ^" 



AT 



+ (aia2 . 



1 - v' 



In this notation, we defined the averages 

{akO-i . . .) 



2A 



— y 



^akai . . . 



{aj=±l} 



(E9) 



(ElO) 



We present the first and second order averages, to illustrate the general technique to obtain the higher orders. 
The first order average is 



Oik 



Q,=±l 



1 



2A^: 



N-l 



(Ell) 



To evaluate the trace, we introduced the matrix P which diagonalizes the transfer matrix T 



(E12) 
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We use the identities 

(E13) 

Substituting into Eq. (jElip . we obtain 

<->^^^(2;)('o^^)"'('o^^)}- 

a result we expect due to the symmetry of the Hamiltonian in Eq. (jElip . Following a similar procedure, we can 
express the second order correlation in the form 



1 Xl-' r'\ A!"-^- rM A^- 



^k—l + N—m^m — k / ^ \ m — k 

^ =(a;, 

= (tanh(J))""* = (1 - 239,)™^'^ (E15) 

From the same analysis, we prove that the correlations for an odd number of a' s vanish, whereas those for an even 
number become 

/ \ \ l — k-\-n — m+... 

{auaiaman ■ . ■) = ij^j = (tanh( = (1 - 2p,)'-^-+"-'"+- (E16) 

Substituting into Eq. (jEOp . we obtain the finite series representation 

1=1 j=l \ / l<fc<(Ti ' 

, Q _ 2d v--fe+9-« i-V-i 

^ 1 + V'i 1 + V'i, 1 + V'i 1 + V'^ 

l<fc<m<n<g ^9 
- ' 1 — 



Finally, we can obtain the alternative representation 

2 



X 



2 2 2 

l<fe<m 

V fi - 20 y'«-fe+9-« ^~^fc 1 - 

^ ^ ^'''^ 2 2 2 2 

l</c<??i<n<q 

j^k,m,n,q j=l 



(E18) 
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APPENDIX F 

For the case of uniform crossover recombination, pc = 1/2, a simplified analysis can be carried out to obtain the 
large N, or Gaussian limit, of the recombination coefficients R^_^ because permutation symmetry is exactly obeyed. 
For the child sequence created from parental sequences with number of sites as ni and 71,2, the number of child 

sequences, n, with sites is given by the expression 

i=i ^ ' 

Here, the path followed by the polymerase while copying from cither parental sequence is parametrized by the random 
variables on = ±1, with (q^) = and (aiUj) = Sij. From Eq. (jFl[) . we obtain the corresponding expression for the 
average composition of the child sequence, u = {N — 2n)/N 



N 

i 

U ■ 



N 

1=1 



From Eq. (jF2p . we obtain the average 
To obtain the variance, we calculate 



^1:2 



N 

i=l 



(F3) 



1 ^ 1 ^ 

i,j=l i=l 
N 

I 

1=1 



2, 



Therefore, we obtain the variance as 



Hence, in the large N Gaussian limit, the recombination coefficients are given by the distribution 

-iV[(«i+«2)/2-«]V(l-"^) 
no , (F6) 

where /m = /("«*)• 

For < 1/2, making the ansatz that correlations between spins at different sites remain 0(N~^), the additional 
contributionto((fe)2)„is [1/(4A^^)] E.^,(l-2p.)l-^l(4-^f)(4-.s,^) - [l/(2iV')] Efc>o E.(l-2pc)'^-(4-sf)(4+.- 
^ [l/(2iV)]S,>o(l-2p,)'^-[(si-s2)2+0(l/Ar)] ^ [l/(4p,iV)][0(l/ViV)2 + 0(1/7V)] ^ const/TV^^ and the large 
N limit becomes that of the Pc = 1/2 case. 

APPENDIX G 

We consider the saddle point condition for recombination in the parallel model. First, we look for the saddle-point 
condition with respect to the fields ^c, '/'c 

+ '-^^ (Gl) 
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6(j)c V Nt 
Eqs. (inil) and become 



1/2 



2Cc + m 



1/2 



U^c + M + 



1/2 



By combining Eqs. (|G3P and (|G4p . with the saddle-point action Eq. ([58|) . we obtain Eq. ([591 



(G2) 



(03) 



(G4) 



APPENDIX H 

We consider horizontal gene transfer of blocks of length M in the Eigen model. The matrix elements of the 
Hamiltonian in the basis of coherent states are given by 



{zk\H\zk-i) 

{Zk\zk-l) 



1 ^ 

^'^zl{j)a3Zk-i{:j 



xe 



N 



-Nd 



We introduce the auxiliary fields 



1 ^ 



3 = i 
N 



Vk 



J^^zl{j)aizk-i{j) 



(HI) 



(H2) 



(H3) 



and the corresponding conjugate fields ^k, Vk to enforce the constraints via Laplace representations of the Dirac delta 
functions. Therefore, Eq. ([77]) becomes 



-Ht 



lim / [Vz*Vz\ \zM){za\ 



M . 



n 

.k=l 



ieNd^kd^k i^Ndrjkdrjk 



27r 



2tt 



xe 
xe 

xe 



-l/2EfciiKO")-?fcO")+5'fc-iO)-5'fc-iO")-2?*(j)-?fc_i(j)] 



/(a)-<i(a)] 



(H4) 



At this point, a perturbation theory analysis similar to the case of the horizontal gene transfer of finite blocks in the 
Kimura model leads us to conclude that to within error 0{M /N) at each order in perturbation theory, it is possible 
to substitute the recombination term by 



JL 1 

M N 



N 



M 



(H5) 



i=i 
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Then, it is convenient to introduce the auxiUary field 



1 ^ 



(H6) 



i=i 



and the corresponding (j)k field to enforce the constraint through a Laplace representation of the Dirac delta function. 
The partition function is obtained from the trace of the evolution operator in Eq. (|H4[) 



Z = Tr 



2lT 



2tt 



lim 

M^oo 



' M 

n '^^ki^^k 



~S[z%z| 



(H7) 



Thus, we obtain 



Z = ^lim / [V£V£T>fiVffD. 



dX 



2tt 



A/ 



.k=l 



The matrix S{i) in Eq. pI8)) is defined by 



/ / 

-A2 / 
-A. I 



\ 



-Am I ) 



Here Ak = I + e(^fccr3 + Vkf^i + (/>fc-D). 

After calculating the Gaussian integral over the coherent states fields, we obtain 



27r N 



lim / TT^ 



— 'iAi 



i=i 

,27r W 



27r 



M 



Zk 



.k=l 



/ nT^^^''Mdct5(j)]- 

.7 = 1 



27r 



i=i 



lim / IT ^g-iAjg-Ti-lnlZ-e'-^jf oxp(cEfcLi?fc'^.3+*)fc'Ti+0fcD)] 

/„ J-i 27r 



AT 



lim rrTrfe^2:;:Li(4-3+rl.^i+0.c) ^giv 
A/^00 



where 

With this result the partition function in Eq. (|H8P becomes Eq. (|78p . 



(H8) 



(H9) 



(HIO) 
(Hll) 
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We consider the saddle-point equations for horizontal gene transfer of blocks of length M in the Eigen model: 



\ Nt 



U^c + U4>c) + [fie + ^ 



1/2 







(II) 
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-Se 
5(t)c V Nt 



-Sc 



u^c + r]c + ^ 



£.ci£.c + U(j)c) + '7c + 



-,1/2 



= 



.A/-1 -M(l-r,c)--^ + -K- 



'h -r 2 



mc) = 



= 



Stjc V Nt 
We obtain the following identities 



Cc(?c + U(j)c) + ?7c + 



_ + M0e/2 

_ fjc + <^e/2 

[Ccfe+W0c) + (^c + 0c/2)2]'/' 



1/2 



1 1 

= 2 + 2- 



<c + Vc + <t>c/'2 



[?cfe + W0c) + (^c + ^c/2)2; 



1/2 



Combining Eq. jlH]) and Eq. (jI10[) . we obtain 

From the system of Eqs. ((I6l) - (jllip . it can be shown that 



t 







(12) 

(13) 
(14) 

(15) 

(16) 

(17) 

(18) 
(19) 

(110) 

(111) 

(112) 



APPENDIX J 

We consider horizontal gene transfer of blocks of variable length in the Eigen model. The Hamiltonian matrix 
elements in the coherent states basis are given, to 0{N^^), by 



{zk\H\zk^ 



{zk\zk- 



X 1 



(M) (M) 



1 ^ 

j^'^zl{j)a3Zk^i{j) 



Nd 



1 ^ 



i=i 



(Jl) 
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We introduce the auxiliary fields 



1 ^ 
1 ^ 



(J2) 



(J3) 



(J4) 



and the corresponding ^k, Vk, 4'k to enforce the constraints via Laplace representations of the Dirac delta functions. 
From Eq. (f90l) . we obtain 



M- 



hm / [vrvzi 



n 

.fc=l 



ieNd£,kd(,k leNdrjkdr^k ieNd(t)kd(t)k 



271 



271 
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We obtain the partition function from the trace of the evolution operator Eq. ([JS 



Z = Tr 



= lim 

A/-+OC 



27r 







N 



nUAj_ 
977 



i=i 



27r ' 



A/ 



n ^'^^'^fe 



By inserting Eq. (jJSp . we obtain 



xe 



X 



2Tr 



N 


27r 








n 


/ 


n T^^kT^^k 








.k=l 





The matrix in Eq. (p7|) is defined by 



= 



/ 







/ 





-A 



V 












I J 



Here Ak = I + eC^c^s + ??fecri + (f>kD). 

After calculating the Gaussian integral over the coherent states fields, we obtain 

,27r N 



lim 

M— »CX3 



n 

27T N 



dX 



27T 



3 g-iA, 



M 



.fc=i 



'Ef=iEf=i5:(i)S.,(i)i-,(i) 



AT 



i=i 
"2^ 



-Q f^^-iAj g-Tr ln[/~e*^J T oxp(e EfLi «fcO-3+f;fc(Ti+</.fc_D)] 



lim rrTrfe^^"=i(«'='^^+**'='^i+^''-^) 



(J5) 



(J6) 



(J7) 



(J8) 



(J9) 
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where, 



Q = Tr Te-I'o ''*'(?'^3+r)cri+0£') 
With this result the partition function in Eq. (p7|) becomes Eq. (pij) . 



(JIO) 



APPENDIX K 

We consider recombination in the Eigen model. The matrix elements of the Hamiltonian operator in the coherent 
states basis are given, to order 0{N), by 



{zk\H\zk-i) 

{Zk\zk-l) 



X [l-v + vg{{zl{j)D]zk^^{])})\f 
"l ^ 

-i;^^zl{j)as.Zk-i{i) 



1 ^ 

-J^^zl{j)azzk-i{j) 



-Nd 



(Kl) 



Here we notice that the function g{{z^{i)D^jZk-i{j)}) is the same as in Eq. (|43p . Therefore, the same analysis 
presented through Eqs. (|43|) - (|45)) regarding the singular behavior of the function g applies for the Eigen model as 

well. Hence, in the large N limit, we have g ^-^ Y^'j=i ^U)^^k-i{j)^ , with D = (Dj) being again the matrix defined 

in Eq. 

Wc introduce the auxiliary fields 



1 ^ 



(K2) 



1 ^ 



(K3) 



1 ^ 



(K4) 



i=i 



and the corresponding conjugate fields ^fc, rik and 0^ to enforce the constraints via Laplace representations of the 
Dirac delta functions. Thus, we have 



lim / \Vz*Vz\ 



M 

n 



ieNd^kd^k ieNdr]kdr]k ieNd(f)kd(t)k 



Lfe=i 

1 -^T^Af 



27r 



27r 



27r 



X |zA/)(2o|e~^ 5i;fc=i Ej=iK0)-2fc(j)+5'fc-i(i)-5*fc-i(i)-2zfc0')-Zfc-i(i)] 

y^f,<^Nj:iL,[e~>^^^-^k)^i^^+^g(^4,^)f(^^^)-d{i^)] 



The partition function is expressed by 
Z = Tr[e-^*P] : 



lim 



A/ 



n ^^"^^^fc 



.fc=i 



20 = e'^ZM 



(K5) 



(K6) 



By inserting Eq. (jKSp . we obtain 



Z ^ lim / [ViV£T>fiVr]V(l3V4)\ 



27r N 

n 



3 g-»A, 



A/ 



n ^^^^^"^ 



.fc=i 



The Gaussian integral can be performed over the coherent state fields, to obtain the representation in Eq. p02p 
the one-dimensional Ising trace is defined by 

Q = Tr Te^o dt'i^aa+rJai+cpD) 



APPENDIX L 

We analyze the effect of introducing different schemes of horizontal gene transfer in the parallel model. 
For the parallel model in the presence of horizontal gene transfer with blocks of size M = 1, we obtain 



du 
dv 



2/'(«o) 

Here, (■foi'^'o) represents the solution for = 0, i.e., they are obtained from the system 



fm = f{uo) = n^o] = /(eo) + /i\/l-eO - M 

From Eq. (|L4[) . we obtain uq from the inverse fimction 



Let us Taylor-expand Eq. (|L5p near x = /(^o), 
with Sx = — — 1). Here, we use the inverse function theorem to obtain the derivatives 



-/"(/-M^l) _ /"(eo) 



Hence, Eq. (jL6p becomes 



From Eq. (|L3p. we have 



"0 = ^0 + — 



Sx f"{^o) {5xf 



f'i^o) (/'(eo))3 2 



Co 
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From Eq. (jLSp into Eq. (jL7p . after multiplying by ^o, we have 



5x 



-Co 



/ (^o) {Sxy 



/'(Co) "'(/'(eo))3 2 

(i-eo'-v/r^) Co /"(Co) 



?0 



2(/'(eo))2 



(LIO) 



Therefore, we finally obtain 



2 (/'(eo))2 M 



(Lll) 



The sign of this expression is clearly determined by — / (■^o); a,nd hence after Eq. (jLl[) we obtain the condition 



du 
dv 



>o if /"(eo) <o 

<0 if /"(eo) >0 



(L12) 



From Eq. (jL12p . we conclude that horizontal gene transfer will enhance selection towards the fittest individuals 
when negative cpistasis is present [/ (u) < 0], while it will introduce an additional load against selection, with the 
corresponding deleterious effect on the mean fitness, when positive cpistasis is present [/ (u) > 0]. This result proves 
that the mutational deterministic hypothesis holds for horizontal gene transfer of blocks of size M = 1 in the parallel 
model. 

For the case of horizontal gene transfer of blocks M > 1, we obtain the equation 



du 
dv 



1 



1 



M 



1 



.^0 Mfiu^) 
We notice by expanding the binomial up to first order, that the leading term in Eq. (|L13|) is 



du 
dv 



2/'K) 



(L13) 



(L14) 



which is identical to Eq. (jLip . and hence the analysis presented for the case M = 1 also applies for M > 1, in 
particular Eq. (|L12p . 

For the process of horizontal gene transfer with multiple-size blocks, with average (M), we obtain the equation 



du 
dv 



{M)f'iuo) 



(L15) 



By expanding the exponential at first order, we obtain that the leading term in this case is also Eq. (jL14[) . which is 
identical to Eq. (jLl[l . Therefore, the analysis presented for M = 1, and in particular Eq. (|L12p applies in this case as 
well. 

In conclusion, we proved that the mutational deterministic hypothesis, expressed in quantitative form by Eq. (|L12[) . 
holds for the different forms of horizontal gene transfer discussed in our work for the parallel model. 



APPENDIX M 



We analyze the effect of introducing different schemes of horizontal gene transfer in the Eigcn model. 
For the Eigen model in the presence of horizontal gene transfer, and for zero degradation rate d{u) = 0, we obtain 
the equation 



du 
dv 



2/'(^i) 



(Ml) 
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The sign of this derivative is determined by the combination uq^o + yl^^o ^ 1' where (^o, uq) represents the solution 
for V = 0, i.e. they are obtained from the system 

m = fiOe-^'-'-^^ (M2) 



5^ 



= 0= /'(Co)- 



(M3) 



By inverting Eq. (jM4p . we obtain uq 



no = f-'m^o]] = /-'[/(eo)e-^^^-^^)] 
We expand Eq. (jMSP near x = f{£,o)- by applying identities Eqs. (jL6l4L9p 

/"(fo) (<SX)2 



"0 = ?0 + 



with (5a; 



-M(i-Vi^) - 1 



/(Co) ~ - Vl^]/(Co). From Eq. (|M3|, we have 



sx _ M[yr^ - i]/(eo) _ 1 - - 



/'(Co) 



6 



From Eq. (|M7|) into Eq. (|M6p . after multiplying by we find 



Wo?o = 4o +?o t,o- 



6 







2 [/'(Co)]= 



l-Vl-Co-/ (?o)^ 



Hence, we obtain 



[/'(&)]= 



(M4) 



(M5) 



(M6) 



(M7) 



(MS) 



(M9) 



Clearly, the sign of this expression is determined by the sign of — / (^o)i and hence after Eq. (|Mip we obtain the 
condition 



du 
dv 



^ I >0 if /'(Co) <o 
.^0 I <0 if /"(Co) >0 



(MIO) 



which proves that the mutational deterministic hypothesis holds for horizontal gene transfer of blocks of size A4 = 1 
in the Eigen model. 

For the case of horizontal gene transfer of blocks of size M > 1 , we obtain the equation 



du 
dv 



r r Q 



Mf'iuo) 



(Mil) 



by 



By expanding the binomial in the numerator of Eq. (jMlip up to first order, we notice that the leading term is given 



du 
du 



Mo6 - 1 + \/i - Co ^^(i-./irjf; 

2/'(uo) 



/(Co) 



(M12) 
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which is identical to Eq. (|Mip . Therefore, the analysis presented for the case M = 1, and in particular Eq. (|M10|) 
applies for M > 1 as well. 

When considering the process of horizontal gene transfer of blocks of multiple size with average (M) , we obtain the 
equation 



du 
dv 



(M) 

e 2 



(«o«o-l+i/l-«o) - 1 
(M)/'(uo) 



/(eo)e 



(M13) 



By expanding the exponential in Eq. (|M13[) up to first order, we notice that the leading term is given by Eq. (|M12[) 
in this case as well, which is identical to Eq. (|M1[) . Therefore, the analysis presented for the process with M = 1, and 
in particular Eq. (|M10[) . applies for the process of horizontal gene transfer of multiple size blocks as well. 

Summarizing, we proved that the mutational deterministic hypothesis, expressed quantitatively in Eq. (|M10p . holds 
for the different forms of horizontal gene transfer studied in this work for the Eigen model. 
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For the case of two-parent recombination in the parallel model, we find that the phase structure is defined by two 
fitness functions. A low z^-dependent phase SI, defined as the maximum in ^ of 



The maximum of this expression, attained at is obtained from the equation 



(Nl) 



(N2) 



We notice that the value is the same as in the absence of recombination, when = 0. Therefore, from the 
self-consistency condition, we obtain for this phase 



(N3) 



Here, we have denoted u^, as the value of the average composition in phase SI, when the recombination rate is v. 
Correspondingly, we also have from Eq. (jN3[) the exact relation 



(N4) 



with f{uo) = J-o[^o] and uq the average composition in the absence of recombination, when v = 0. 

Let us define as the value of the average composition at the S2 phase, which is independent of the recombination 
rate. The value is obtained as the solution of the non-linear equation 



(N5) 



We consider in Eq. (|N4p the value ly = ly* a.t which the average fitness of the SI and S2 phases are identical, as the 



condition u^* ~ u». 

In Eq. (jN6[) . let us consider the Taylor expansion of /(it*) near uq, up to first order in e = — uq, 

1^* = -ef' {u,) + 0{e^) 
We expand Eq. (IN5I) near uq at first order in e = it* — uq, 



(N6) 



(N7) 



/ (wo) + e/ (uq) 



2/i(uo + e) 2fi{uo + e) 



l-{uo + ef 



l-ul 



2ua 
1 - z 



2/_tuo 



2/i 



1 + ^0 



(N8) 
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We solve explicitly for e in Eq. (|N8p . and combine with Eq. (|N7p . to obtain an expression for 



(N9) 



Let us now analyze the sign of v* as a function of the sign of the curvature of the fitness function, as defined by 
/ . We consider the Laurent series of f{u) for small u. That is, 

f{u) = ku-^ 
f'(u) = 

f"{u) = fca(a- (NIO) 

where a > to satisfy the monotonically increasing condition. This family of polynomials provides a representation 
of arbitrary, monotonically increasing functions for small uq- 

The case a = 0, corresponding to a constant identical fitness for all sequence types in the population, possesses the 
trivial solution after Eq. (jN2|) = 0, which implies uq ~ 0. and after Eq. (jNSP u* = 0. Thus a single non-selective 
phase is observed for this case, both in the presence and in the absence of recombination. 

From Eq. (|N10[) . we have / < for a < 1, / > for a > 1 and / = at a = 1. We analyze these possible cases 
separately. From Eq. (|N10|) into Eq. (jN9|) . we have 



kau-^(kau'^ - ^) 

V* = 2__ 

ka{a - IX - 2/xM§ (i_„"|)2 

Case 1: a < 1, /" < 0. 

The denominator in Eq. (jNlip is clearly negative, since a — 1 < in this case. 
The numerator, for Uq 



(Nil) 



kau'^ ~ - kau'^ - 2iiul > (N12) 



Therefore, in this case v* = < 0, and hence — uq > 0. 
Case 2: 1 < a < 2, /" > 0. 

The denominator in Eq. (jNlip . for mq <C 1 and a — 1 > 0, 



1 + 

ka{a - - 2^iul- §- - ka{a - l)u^ - 2fiul > (N13) 

(1 — Wq) 



The numerator is also positive, by the same argument as in Eq. (jN12p . Therefore, in this case u* = > 0, and 
hence u^, < Uq. 

Case 3: a > 2, /" > 0. 

The denominator in Eq. (jNlip . for uq <C 1 and a — 1 > 0, 



1 + 

ka{a - - 2^ug ^ - ka{a - l)u^ - 2/iu^ < (N14) 

The numerator is 



- kau^ - 2nul < (N15) 

1 — Uq 



Therefore, in this case = > 0, and hence — uq < 0. 



For a = 1, we obtain an exact solution from Eq. (jN2p . uq = -^/l + fi^/k"^ — ji/k. This result in Eq. (jNlip yields 
V* = 0, and thus = uo for this particular case. 

For a = 2, we have the analytical solution presented in Eqs. ([7T|) . 



-*--o = Jl-2^^-fl-^^)=\/fl-^^)'-^^-(l-^)<0 (N16) 
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with J^* = ffe > 0. 

Summarizing, we proved that 

f > 0, /" < 

u^-uq= < ' „ (N17) 
\ < 0, / > ^ ' 

This result proves the mutational deterministic hypothesis for two-parent recombination in the parallel model. 



APPENDIX O 

For the case of two-parent recombination in the Eigen model, we find that the phase structure is defined by two 
fitness functions. A low i^-dependent phase SI, defined as the maximum in ^ of 



H'^m^{l-i^)e-^^''^^ (01) 
The maximum of this expression, attained at is obtained from the equation 

d 



Q^^l'n^o] = 



/ (Co) = -f==/(eo) 

[In /(Co)]' = -^t= (02) 

We notice that the value is the same as in the absence of recombination, when 1^ = 0. Therefore, from the 
self-consistency condition, we obtain for this phase 

fil^ = H'^ Ko] = (1 - ^^)^oKo] = fM (03) 

Here, we have denoted Ui, as the value of the average composition in phase SI, when the recombination rate is v. 
Correspondingly, we also have from Eq. ()03|) the exact relation 

f{u,) = (1 - z.)/(uo) (04) 

with f{uo) — J-q[^q] and uq the average composition in the absence of recombination, when i/ = 0. 

Let us define as the value of the average composition at the S2 phase, which is independent of the recombination 
rate. The value is obtained as the solution of the non-linear equation 

1 - 

[ln/(u,)]' = (05) 
We consider in Eq. (|Q3p the value — i/* at which the average fitness of the two phases are equal, as the condition 

111/* — w,^ , 

We take the logarithm of this expression, and Taylor expand up to first order in e = — mq, 

ln(l-j/*)=ln[/(7io + e)]-ln[/(uo)] 

-ly* =e[\nf{uo)]' (07) 
We expand Eq. (jOSP near uq at first order in e = — uo, 

2/_i(ito -I- e) 



[ln/(uo)] +e[ln/(«o)] = 



l-{uo + ef 

1 



2^(wo + e) 



l-ul 



2uo 
1 — uf 



^0 

l^+2M-i±4,. + 0(.^) (08) 
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We solve explicitly for e in Eq. (jOSp . and combine with Eq. (|07|) . to obtain an expression for 

^* = [In fM] j^T^ (09) 



The analysis follows the same lines as in the parallel model case. That is, we analyze the sign of v* after Eq. (j09 
We consider a family of polynomials f(u) = ku°' + kg, which for <C 1 

ln/(M) = In ( 1 + -^uA + In(fco) -^u" + \n{ko) 



ko J ko 



a—u 

ko 



[ln/(u)]" = a(a-l)-^u"-2 (OlO) 
fco 

with a > to satisfy the monotonically increasing condition. This family of polynomials provides a representation of 
smooth and monotonically increasing functions for small uq. 

The case a = corresponds to a constant identical fitness for all sequence types in the population, and possesses 
the trivial solution after Eq. (j02[) = 0, which implies uq = 0, and after Eq. (j05[) = 0. Therefore, a single 
non-selective phase is observed for this case, both in the presence and in the absence of recombination. 

From Eq. (jOlOp . we have / <OforQ:<l,/ >0 for a > 1 and / = at a = 1. We analyze these possible cases 
separately. From Eq. (jOlOp into Eq. (|09p . we have 



IK -2^0^ 

Case 1: a < 1, /" < 0. 

The denominator in Eq. (|011[) is clearly negative, since a — 1 < in this case. 
The numerator, for ^ 1 

^ „.„.« 2/iUQ k ^ o..,,2 



(Oil) 



, -au[^ ^ - —au^ - 2/iM^ > (012) 

ko 1 ~ ""o ''^o 

Therefore, in this case z/* = < 0, and hence — uq > 0. 
Case 2: 1 < a < 2, /" > 0. 

The denominator in Eq. (jOlip . for uq ^ 1 and a — 1 > 0, 

^a{a - l)u^ - 2 (ml ^ "° - ^a(a - l)u^ - 2fiul > (013) 

The numerator is also positive, by the same argument as in Eq. (j012[) . Therefore, in this case v* = ^^^j and hence 
M* < Mo- 
Case 3: a > 2, /" > 0. 

The denominator in Eq. (jOlip . for mo ^ 1 and a — 1 > 0, 

^a{a - l)u^ - 2^,ul-^^^ ^ ^a{a ~ IX - 2,iul < (014) 

ko (1 - u^r fco 

^au'^ - ^ ^au^ - 2fiul < (015) 

ko 1 - "o fco 

Therefore, in this case v* = -^^^ > 0, and hence — uo < 0. 

For « = 1, we find that for « 1 and uo ^ 1, = + O (-^Y, ^ ^ + O (^\\nd uo - 



The numerator is 



k 



- + O (^27^^) ■ Therefore, — Uo — and v* = in this case. 
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For a = 2, we have the exact solution expressed in Eqs. (|113p . (|114p . The region of parameters space where phases 
SI and S2 intersect is < 1. We analyze these formulas considering that < 1 and uq < 1. It is convenient to 

k 

1 



define in this case the small parameter e = 2^^ < 1. From Eq. (j05ll . we have 



Expanding Eq. (jllSp up to first order in e, we obtain the result 



- 0{e) (016) 



uo^y2 ^^^^ — -g(e) (017) 

Therefore, for e ^ 1, from Eq. (j017p and Eq. (j016p . when a = 2, < uq, and hence v* > 0. 
Summarizing, we have shown that 

) > 0, /" < /r^ioN 

M, - ito = < „ (018 

\ < 0, / > 

This result proves the mutational deterministic hypothesis for two-parent recombination in the Eigen model. 
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